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The past decade has ushered in a new era in the study of fern and lycophyte 
genomes. In 2011, the first nuclear genome of a lycophyte, Selaginella 
moellendorffii, was sequenced and assembled (Banks et al., 2011). It would 
take another seven years of technological advances and reduction in 
sequencing costs before the first nuclear genome assemblies for ferns were 
published in the summer of 2018 (Li et al., 2018). Additional Selaginella 
nuclear genomes have been sequenced and assembled with more on the way 
(VanBuren et al., 2018; Xu et al., 2018). Given these new resources, it seemed 
an appropriate time to ask how we got to this stage and where genomic 
research on ferns and lycophytes may go over the next few years. 

Although a complete (or close to complete) genome is a unique resource 
with many applications, these published sequences are not our first glimpses 
into the genomic world of ferns. The earliest observations of fern genomes 
came at the chromosomal level, by the pioneering work of Irene Manton 
(Manton, 1950), who advanced a squash technique and was one of the first 
cytologists to examine chromosome numbers and their behavior in ferns. 
Manton’s discoveries led to a wide range of studies on variation in fern life 
cycles and life history patterns. One of the significant insights from these 
studies was that fern genomes are, on average, partitioned among more 
chromosomes than those of seed plants (Klekowski and Baker, 1966). Later in 
the twentieth century, Diana Stein was the first to explore fern nuclear 
genomes in ferns by examining reassociation kinetics of repetitive elements in 
Osmunda (Stein and Thompson, 1975; Stein, Thompson, and Belford, 1979). 
These studies were soon followed by the first experiments exploring the 
physical structure of chloroplast genomes (plastomes) in ferns (Palmer and 
Stein 1982). The relative ease with which plastomes can be studied, relative to 
nuclear genomes, resulted in a decades-long focus on structural rearrange- 
ments (Raubeson and Stein 1995; Stein et al. 1992), and subsequently 
nucleotide variation (Hasebe et al., 1994; Hasebe et al., 1995; Wolf, 1997) in 
fern and lycophyte plastomes. Eventually, plastome structure and nucleotide 
variation was being examined simultaneously using complete plastome 
sequences (e.g., Gao et al., 2009; Labiak and Karol, 2017; Wolf et al., 2011: 
Wolf et al., 2003). 
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Although great strides were made in understanding fern and lycophyte 
phylogeny, the focus on plastome data meant that fewer researchers were 
delving into the complexity of their nuclear genomes. Inferences from 
plastome data are often clear, and resulting phylogenetic estimates can be 
well-resolved, but these gratifying outcomes often obscure much of the 
underlying evolutionary complexity. Because hybrids usually inherit plas- 
tomes from only one parent, complexity in phylogenies resuiting from 
reticulations are avoided, as are clues to the parentage of hybrids. Such 
information can be revealed more easily using data from nuclear-encoded 
markers. Earlier studies used the products of nuclear genes (e.g., isozymes) to 
explore population genetics (e.g., Gastony and Gottlieb 1985) and hybridiza- 
tion (Barrington, Haufler, and Werth,, 1989; Werth, Guttman, and Eshbaugh, 
1985a; Werth, Guttman, and Eshbaugh, 1985b) in ferns. Development of 
nuclear markers at the nucleotide level in ferns was more recent (Ishikawa et 
al., 2002; Rothfels et al., 2015b; Schuettpelz et al., 2008) aided by resources 
such as transcriptomes (Rothfels et al., 2013). The addition of data from 
nuclear genes enabled discovery and analysis of evolutionary patterns in ferns, 
such as hybrids between highly diverged genera (Rothfels et al., 2015a). 
Pteridologists were learning rapidly about how divergent genomes interact. 
Yet, understanding the structure of fern genomes requires more than just 
sequences from a few genes. The first insights at the genome level came from 
the first genetic map of a fern, that of Ceratopteris richardii (Nakazato et al., 
2006). Although this map revealed statistical evidence of synteny consistent 
with a whole genome duplication (WGD), large syntenic blocks providing 
strong evidence of a WGD were not detected. This was unexpected because 
one hypothesis for high chromosome numbers in homosporous ferns is a long 
evolutionary history of polyploidy (Haufler, 1987). Analyses of transcriptomes, 
genome sizes, and chromosome numbers have provided explanation for this 
result. The distribution of chromosome numbers in a phylogenetic context 
suggest that about 31% of speciation events in ferns are associated with 
polyploidy, compared to about 15% in seed plants (Wood et al., 2009). 
However, the strong correlation between chromosome number and genome 
size in ferns (Clark et al., 2016; Nakazato et al., 2008; Sessa and Der, 2016) 
suggests that ferns have less dynamic genomes than seed plants, and that 
whole genome duplication (WGD) events have not occurred at a higher rate in 
ferns. Furthermore, the distribution of duplicated genes detected in tran- 
scriptomes also suggests that ferns have not undergone more WGDs than seed 
plants (Barker, 2013; Barker and Wolf, 2010). An alternative explanation for 
higher chromosome numbers is that ferns do not lose genetic material between 
cycles of WGD, at least at the rates seen-in seed plants (Baniaga, Arrigo, and 
Barker, 2016; Barker, 2013; Barker and Wolf, 2010). What we currently lack is 
an understanding of the association between high chromosome number and 
homospory. Ultimately, the answer may lie in detailed dissections of complete 
nuclear genomes. Hopefully, such comparative analyses of the genomes of 
homosporous and heterosporous ferns and lycophytes may aid in explaining 
an observation made over 50 years ago by Klekowski and Baker (1966) 
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The need for fern and lycophyte genomic resources has been recognized for 
some time (Pryer et al., 2002; Sessa et al., 2014). But because of the high costs 
involved, early explorations focused only on expressed regions (Der et al., 
2011; Matasci et al., 2014) or low-coverage genome skimming (Wolf et al., 
2015). The major cost in most genome sequence projects is for analysis, rather 
than the short-read sequencing. Assembling the resulting contiguous regions 
into complete, or nearly complete, chromosome scaffolds requires a consid- 
erable investment of resources, and these increase in cost as does the size and 
complexity of the genome. Although traditional funding sources should not be 
ruled out, some imaginative thinking led to a crowdfunding approach for the 
genomes of Azol/a and Salvinia (Li and Pryer, 2014). Five years later, we are 
seeing the fruits of these investments (Li et a/., 2018). The attraction of Azolla 
for a crowdfunding approach results from several key characteristics that are 
not typical of other ferns. Unlike most homosporous species, members of the 
heterosporous clade, to which Azolla and Salvinia belong, typically have small 
genomes: 0.75 gigabases (Gb) for A. filiculoides and 0.26 Gb for S. cucullata. 
An additional attraction to Azolla is that it contains nitrogen-fixing 
endosymbionts that enable the plant to grow in low nitrogen environments, 
and as a result. Azolla has been used as fertilizer for rice for over 1000 years. 
Providing further interest in Azolla is the documented “Azolla Event” whereby 
approximately 50 mya, a massive global-scale bloom of Azolla lead to so much 
sequestration of atmospheric carbon that it resulted in the globally warm 
greenhouse planet shifting to the current cooler climate. The economic value 
combined with the unusual evolutionary history of Azolla certainly helped in 
the funding approach, but perhaps more importantly, this type of funding 
helped engage the public in fern research. It is likely that continued public 
outreach will be needed in pteridological research. 

The time seems appropriate for dedicating a special issue of the American 
Fern Journal to the topic of fern and lycophyte genomics. Genomics in general, 
and all its subdisciplines, have infiltrated almost every field of biology, making 
attempts to isolate fern genomics as a distinct research arena challenging. Here, 
we sought to gather contributions that illustrate the current state of the field, 
defined in its broadest sense. We also aimed to gather mostly early career 
contributors, who represent the future of fern and lycophyte research. 

We start with two review papers, one by Marchant (2019), focusing on 
Ceratopteris richardii as a model experimental system for reproductive 
biology, teaching, and more; and how these areas will be enhanced by a fern 
genome assembly. The contribution by Petlewski and Li (2019) takes this 
further and places C. richardii in a broader context with other potential fern 
model systems alongside corresponding genomic resources. Kuo and Li (2019) 
provide new data on genome sizes in ferns and suggest what might be the most 
fruitful genome sequencing projects for the future. Sigel et al. (2019) present 
data from transcriptomes of Polypodium hybrids and their parents, illustrating 
that the hybrids may not be perfectly balanced in the expression of genes from 
the different parents. Baniaga and Barker (2019) examine the age distribution 
of long terminal repeat retrotransposons (LTRs) in fern and lycophyte 
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genomes. They find that species with smaller genomes tend to have more 
recent LTR insertions. Thus, it seems that, as in seed plants, genome size can 
be a function of LTR insertions, and not just chromosome number, The 
contribution by Kinosian et al. (2019), uses a genomic approach (polymor- 
phisms gathered from RADseq) to examine hybrid origins in Dryopteris. 
Despite success of this approach in other fern genera, the results did not see 
clear additivity of nuclear genomic contributions of putative parents to 
allopolyploids. These findings are discussed in the context of sample size, 
species divergence times, methodological choices and sequencing options. 

The future of fern genomics, and pteridology in general, is exceptionally 
bright, thanks to expanding genomic resources, and more importantly, highly 
talented fern and lycophyte biologists at the forefront of this research. We hope 
the readers of this issue are as inspired by reading the articles as we were 
during the editorial process. 
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Ferns with Benefits: Incorporating Ceratopteris into 
the Genomics Era 
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ApstrAcTt.—Ferns were the last major lineage of land plants without a reference genome. With the 
coming publication of the Ceratopteris genome assembly and recent publication of two water fern 
genomes, it is time to review the past and future applications of Ceratopteris and fern research. 
Here I delve into the utilization of a Ceratopteris gnome assembly for studying the alternation of 
generations, reproductive biology, genome biology, and the C-Fern Curriculum. 


Key Worps.—genomics, alternation of generations, reproductive biology, curriculum development 


INTRODUCTION 


Why study ferns? That is a question frequently asked of those plant 
biologists brave enough to stride into the seedless world, but not an 
unreasonable one. Consider the most common crops — corn, wheat, barley, 
rice, soy, etc. — all flowering plants. What about the more charismatic flora — 
those plants one would find in Instagram photos or growing on the 
windowsill? Unfortunately, all redwoods, oak trees, orchids, and sunflowers 
— flowering plants or gymnosperms with nary a fern in sight. What about the 
biologist’s argument of general diversity? Indeed, ferns account for around 
10,500 species (Christenhusz and Chase, 2014; PPG I, 2016), which is the 
second most species rich group among the major clades of vascular plants 
(angiosperms, gymnosperms, ferns, lycophytes). However, ferns seem far less 
diverse when compared to the ~400,000 species of angiosperms. A clade of 
10,000 species would be considered a medium sized family among flowering 
plants, and just a fraction of the largest families such as composites and 
orchids (~30,000 species each). It is reasonable to ask again, “why study 
ferns?” 

It can be argued that any knowledge of the natural world has innate value 
and the future potential of that knowledge, for whatever usage, is unknown 
and therefore limitless. More pragmatically, though, ferns are of special 
interest from a number of basic and applied perspectives. For one, ferns are the 
sister clade to the seed plants and are therefore critical for understanding the 
origin and evolution of many plant features (including, but not limited to, 
flowers, fruits, and seeds). By understanding how these traits arose and 
evolved we can better elucidate how they function. This argument is 
essentially the same reason why scientists study chimpanzees and bonobos 
in order to make inferences about human evolution — it is all about providing 


184 AMERICAN FERN JOURNAL: VOLUME 109, NUMBER 3 (2019) 


reference, or rooting, among branches of the tree of life. In addition, ferns have 
a number of life history traits that make them uniquely qualified for answering 
questions regarding reproductive biology and the impact of ploidal level. 
Finally, ferns have a suite of genomic characteristics that have perplexed 
evolutionary biologists for decades. Here I will delve into the features that 
make ferns of particular interest to plant genetics and genomics and why one 
specific fern is an ideal model system for taking ferns into the next generation 
of plant biology. 


Whuy CERATOPTERIS? 


The genus Ceratopteris Brongn. (Pteridaceae) historically comprised four 
species (C. richardii, C. thalictroides, C. pteridoides, C. cornuta; Lloyd, 1974), 
however phylogenetic analysis and crossing tests demonstrated that C. 
thalictroides sensu lato comprised four cryptic species: C. froesii, C. 
gaudichaudii, C. oblongibloba, and C. thalictroides sensu stricto (Masuyama 
and Watano, 2010). The genus is found throughout the New and Old World 
tropics and sub-tropics in semi-aquatic habitats. All of the species are annuals 
and the leaf architecture can differ tremendously between species, among 
populations, and even on an individual as leaves vary dramatically in shape 
and form depending on plant maturity, environmental conditions, and fertility 
(Lloyd, 1974). In addition, the species of Ceratopteris differ in number of 
spores per sporangium, annulus development, and chromosome number. 
Ceratopteris richardii has both the fewest spores per sporangium (16) and the 
lowest chromosome count (n = 39) in the genus compared to 32 spores in the 
sporangia of all other species and up to 78 chromosomes in C. thalictroides 
(Hickok and Klekowski, 1974; Lloyd 1974). While the lower spore number of 
C. richardii suggests an apogamous life cycle, it undergoes normal sexual 
processes (Hickok and Klekowski, 1974). Semi-fertile hybrids between C. 
richardii, C. cornuta, and C. pteroides have been synthesized (Hickok, 1985) 
and natural hybrids exist between the species where their ranges overlap 
(Lloyd, 1974). Given the inter- and intra-specific morphological variation, 
multiple ploidal levels, and presence of natural hybrids, the genus Ceratop- 
teris is primed for phylogenetic reassessment at both the species and 
population level. 

Ceratopteris richardii (hereafter Ceratopteris) is a tropical species from 
Central and South America. It has been touted as an ideal “model” fern species 
for studying genetics, genomics, physiology, and reproductive biology due to 
its ease of propagation and short generation time (Chasan, 1992; Cooke, 
Hickok, and Sugai, 1995; Eberle et al., 1995; Hickok , Warne, and Fribourg, 
1995; Warne, Vogelien, and Hickok, 1995; Sessa et al., 2014). More 
importantly, what makes Ceratopteris an ideal model system for ferns, 
especially for modern genetic and genomic investigations, is its typical-ness 
(Sessa et al., 2014). Like most ferns, it is homosporous, and it has a haploid 
genome size of 11.25 Gb, large compared to most eukaryotic genomes, but 
average by fern standards (Wolf et al., 2015). Similarly, Ceratopteris has a 
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haploid chromosome number of 39, also high in the context of most other 
organisms, but below the average chromosome number among ferns (n = 59; 
Rice et al., 2015). The expansive genomes of Ceratopteris and homosporous 
ferns in general have hindered the publication of a fern reference genome (but 
see Li et al., 2018), however, advances in sequencing technology now make the 
exploration of these genomes feasible. With the imminent publication of the 
Ceratopteris genome (Marchant et al., in revision), now is an appropriate time 
to summarize Ceratopteris biology and highlight the research contributions 
that this newly sequenced genome may facilitate. 


ALTERNATION OF GENERATIONS 


The alternation between the haploid gametophyte and diploid sporophyte is 
central to plant biology. Gametophytes produce male and/or female gametes 
that undergo syngamy to develop into a sporophyte that produces spores via 
meiosis, which develop into gametophytes. While this life cycle is constant 
across land plants, the interdependence, reproductive mechanisms, and 
temporal dominance of these life stages varies depending on the plant lineage. 

All ferns, with the exception of the water ferns (Salviniales representing 1% 
of fern species), and 28% of lycophytes (PPG I, 2016) are homosporous and 
have both independent sporophytes and independent gametophytes (Haufler 
et al., 2016). Anatomically, a fern gametophyte consists of a prothallus (the 
primary photosynthetic structure), rhizoids (filamentous single cells that 
absorb water and nutrients), antheridia (produces sperm), and/or archegonia 
(produces eggs), while sporophytes consist of rhizome (stem), leaves, and 
roots. Although often miniscule and easily overlooked in contrast to their 
sporophyte counterparts, homosporous fern gametophytes are nevertheless 
autonomous, free-living organisms. 

Investigations of the alternation of generations in Ceratopteris have a 
substantial advantage over similar studies in seed plants due to the size and 
independence of the two life stages. Even though the Ceratopteris gametophyte 
is small (.5-2 mm; Hickok, Warne, and Slocum, 1987), it is large compared to 
the seven-celled embryo sac (megagametophyte) and three-celled micro- 
gametophyte within the pollen grain of Arabidopsis (the “model” angiosperm). 
In addition, the isolation of gametophytes in seed plants can be tedious as the 
embryo sac is entirely enclosed by sporophytic tissue and requires micro-scale 
isolation techniques to investigate. Not only are Ceratopteris gametophytes 
large enough to pick out by eye on a media plate, they can be grown and 
collected without any chance of sporophytic contamination. 

Transcriptomics to identify genes of interest and a reference genome 
assembly to pinpoint the regulatory regions of those genes will be imperative 
for investigating fern biology and development, especially alongside the ability 
to transform Ceratopteris via microparticle bombardment and Agrobacterium 
(Plackett et al., 2014; Bui et al., 2015). As an intermediate between 
gametophyte dominance in the bryophytes and sporophyte dominance in the 
seed plants, ferns provide an exceptional reference for understanding the 
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alternation of generations, plant development, and land plant evolution 
(Wellmer et al., 2014; Plackett, Di Stilio, and Langdale 2015, Plackett et al., 
2018). 


REPRODUCTIVE BIOLOGY 


Homosporous fern gametophytes can also be facultatively unisexual. 
Ceratopteris produces antheridiogen, a hormone that inhibits female function 
in gametophytes; large bisexual gametophytes can thus induce their neigh- 
boring gametophytes to be male-only (Warne and Hickok, 1991; Banks, Hickok, 
and Webb, 1993; Banks, 1994; Eberle et a/., 1995). As male-only gametophytes 
can differ dramatically from their bisexual counterparts, the transcriptional, 
epigenetic, and developmental ramifications of antheridiogen has been a 
subject of interest for understanding plant sex determination (Eberle et al., 
1995; Strain, Hass, and Banks, 2001; Atallah et al., 2018). 

Transcriptomic comparisons of male-only and bisexual Ceratopteris game- 
tophytes identified four function-based clusters of gene ontology (GO) terms 
associated with the differentially expressed genes (Atallah et al., 2018). One 
cluster was linked to tissue development, as expected by the differing 
morphologies of the two gametophyte types, and another to hormone response 
and gene expression. Interestingly, GO terms for chromatin structure and 
epigenetic regulation formed the fourth cluster, containing genes linked to 
DNA methylation, histone modifying enzymes, chromatin remodeling, and 
RNA-mediated gene silencing pathways. The epigenetic regulation and control 
of transposable elements (TEs) during sporogenesis and gametophyte devel- 
opment is of particular interest to reproductive biology as TEs can destabilize 
the genome, often disrupting genes but also occasionally producing novelty 
(Brandt et al., 2005; Feschotte et al., 2007). As a result, TE regulation is often 
linked to plant stress in flowering plants, allowing destabilization under 
stressful conditions and stability when lacking stress (reviewed in Chen et al., 
2016). Differing regulation of TEs in male-only and bisexual gametophytes of 
Ceratopteris may have profound implications in terms of linking sex-biased 
genomic stability, genome size, and sources of genetic novelty. 

Ceratopteris also provides a variety of sexual and asexual reproductive 
mechanisms, some of which are not naturally found in seed plants (Haufler et 
al., 2016). Similar to seed plants, crosses of Ceratopteris can be readily made 
using gametes from separate gametophytes originating from differing sporo- 
phytes (sporophytic crossing) or the same sporophyte (sporophytic selfing). 
Where homosporous ferns differ from seed plants is their ability to undergo 
crosses of gametes from the same gametophyte (gametophytic selfing; GS), to 
produce an entirely homozygous sporophyte (Klekowski, 1972). The natural 
ability to undergo this most extreme form of inbreeding has been hypothesized 
to play a role in the large genome sizes and numerous chromosomes typical of 
homosporous ferns, but evidence has been conflicting (Klekowski and Baker, 
1966; Klekowski, 1972; Haufler and Soltis, 1986; Soltis and Soltis, 1987; 
Haufler, 2014). 
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Beyond GS, Ceratopteris also has the ability to undergo apogamy, in which 
haploid gametophyte tissue develops into sporophytic tissue without 
syngamy. The resulting sporophyte has only the haploid chromosome number. 
Alternatively, Ceratopteris can also undergo apospory in which diploid 
gametophytes develop from sporophytic tissue (Hickok, Warne, and Fribourg, 
1995). The natural ability to produce haploid sporophytes, diploid gameto- 
phytes, and completely homozygous diploids makes Ceratopteris exceptional 
for studying the transcriptional ramifications of polyploidy, as heterozygosity 
can have confounding results when comparing expression patterns in 
polyploids and their diploid progenitors. In the case of Ceratopteris, the 
effects of gnome doubling can be entirely isolated at the haploid, diploid, and 
polyploid levels using only a single homozygous genotype with reciprocal 
ploidal origins for gametophytic and sporophytic tissue. As numerous 
breeding programs are now artificially producing doubled haploid lines 
(DHLs) and polyploids of crops, understanding the genetic effects of these 
different ploidal levels in a system where complete homozygosity and 
polyploids naturally occur can be highly informative. 


(GENOME BIOLOGY 


The use of reference genomes has revolutionized biology — whether studying 
physiology, phylogenetics, development, cellular, or molecular biology, the 
ability to delve into the genetic blueprint of an organism can be essential. While 
the forthcoming publication of the Ceratopteris genome (Marchant et al., in 
revision) will undoubtedly play a role in the fields described above and future 
investigations, the incorporation of a homosporous fern genome into land plant 
evolutionary genomics will also address decades-old questions. 

Land plants are infamous for possessing the highest variation in genome size 
and composition across the vast diversity of extant organisms. Solely among 
vascular plants, chromosome counts range from 2n = 2 to 2n = 1440, while 
genome size varies close to 2,500 fold (Bennetzen, Ma, and Devos, 2005; Rice et 
al., 2015). Ferns exemplify this genomic variation with some of the largest 
genomes and most numerous chromosomes on average (Klekowski, 1972; Rice 
et al., 2015) and understanding the processes leading to these massive 
genomes has been an ongoing effort (Klekowski and Baker, 1966; Klekowski, 
1972; Haufler, 2014). A number of life history traits have been linked with 
chromosome number and genome size, but the substantial differences in 
genomic composition between heterosporous (average n = 16 in angiosperms 
and n= 14 in heterosporous ferns and lycophytes) and homosporous (average 
n=57 in homosporous ferns and lycophytes) vascular plants have long been 
recognized (Klekowski and Baker, 1966; Klekowski, 1972) yet lack evidence of 
causation (Haufler and Soltis, 1986; Soltis and Soltis, 1987). 

One novel hypothesis linking homospory/heterospory and genome size is 
the ability of GS to rescue major chromosomal malformations (polyploidy, 
aneuploidy, inversions, fissions, fusions) that occurred during meiosis. 
Because, in individuals that undergo GS, both sperm and egg are derived 
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from the same meiotic product they will have identical chromosomal 
malformation. Thus, homologous pairing of chromosomes in the sporophyte 
following syngamy should not be impeded. Compare this to any other form of 
sexual reproduction in which a gamete with chromosomal malformations will 
lack an identical counterpart and thus be subjected to purifying selection 
during meiosis. Although GS rates have been found to be variable in sampled 
homosporous species (Soltis and Soltis, 1992), the occasional or even rare GS 
and consequent loss of negative selection on chromosomal rearrangements due 
to chromosomal mis-pairing could have profound impacts on a lineage. 
Although anecdotal, it should be noted that the genus Ophioglossum has the 
highest chromosome number variability (n = 30 — 720) of any fern and has 
entirely subterranean gametophytes, a trait associated with high rates of GS 
(Soltis, Soltis, and Holsinger, 1988; Rice et al., 2015). 

This hypothesis does not take into account possible selection on the fixed 
chromosomal change in the resultant sporophyte. Nor does it account for 
selection at the gene product scale, such as dosage effects; however, evidence 
continues to grow that gene silencing without pseudogenization whether via 
small RNAs or epigenetic mechanisms following genetic duplications are not 
as rare as previously believed (Feldman and Levy, 2012; Sehrish et al., 2014; 
Song and Chen, 2015). Chromosome-scale genome assemblies of Ceratopteris 
and other homosporous fern species alongside mutagenesis experiments could 
readily address this hypothesis. 


C-FERN CURRICULUM 


Ceratopteris can be found in K-12 and undergraduate biology classrooms 
across the globe for studying the alternation of generations via the C-Fern 
Curriculum (www.c-fern.org). Associated with the C-Fern Curriculum are a 
number of commercially distributed mutant genotypes ranging from glyph- 
osate tolerant to polka dot (clumping chloroplasts) to dark germinating spores. 
Ceratopteris is an exemplary model for teaching genetics because students can 
easily mutagenize the haploid spores using EMS or X-rays and see the 
resulting mutant phenotypes in the gametophytes (Hickok, Warne, and 
Slocum, 1987). The ability to undergo GS allows students to produce 
completely homozygous sporophytes for which any mutant phenotype will 
also be apparent. In addition, true-breeding mutant inbred lines can be 
maintained perpetually or easily be crossed with other genotypes. Mapping 
RNA-sequenced transcripts onto a reference genome in the classroom can 
make Ceratopteris and the C-Fern Curriculum the next model system for 
teaching plant genetics, genomics, breeding, physiology, and bioinformatics. 


SUMMARY 


Ceratopteris has been utilized as the fern model system for decades due to its 
short generation time, ease of propagation, and simple mutagenesis and 
screening protocols (Cooke, Hickok, and Sugal, 1995; Eberle et al., 1995; 
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Hickok, Warne, and Fribourg, 1995). However, while other plant model 
systems such as Arabidopsis, maize, and Physcomitrella have entered the 
genomics era with reference genomes and numerous genetic resources, 
Ceratopteris and ferns as a whole have been slow to gain comparable resources 
(Sessa et al., 2014; Wolf et al., 2015). The recent publication of two water fern 
genomes (Li et a/., 2018) and publication of the Ceratopteris genome (Marchant 
et al., in revision) will remedy this substantial resource gap, leveling the 
playing field between ferns and the other major lineages of land plants. As 
demonstrated by the numerous labs across the globe incorporating Ceratopteris 
and other ferns into their research, much can be learned from this plant lineage 
regarding life history traits, reproductive biology, development, evolution, and 
genome biology. 
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ApstRAct.—Ferns are one of the most speciose lineages of land plants, and occupy an important 
phylogenetic position sister to seed plants. Despite this, ferns remain one of the last groups of land 
plants that do not have a fully developed model system. Here we review the biology and status of 
each emerging fern model. While reference genomes have been completed for Azolla filiculoides 
and Salvinia cucullata, they lack transformation capability. Meanwhile, other ferns including 
Marsilea vestita, Adiantum capillus-veneris, Pteris vittata, and Ceratopteris richardii have 
transformation methods, but lack genomic resources. Nevertheless, as genome sequencing 
becomes increasingly more affordable, we believe that M. vestita and C. richardii can become 
powerful fern models, which represent heterosporous and homosporous ferns, respectively. 


Key Worps.—Ceratopteris, genomes, Marsilea, model systems, transformation 


With over 12,000 species (PPG I, 2016), ferns are a diverse lineage that is 
integral to understanding terrestrial plant evolution and ecology. For example: 
How has the diversity of life cycles evolved? What genetic, molecular, and 
ecological mechanisms have governed major life cycle changes? What 
mechanisms control the tracheophyte body plan and how have they evolved 
through time? 

The alternation of generations (phases) life cycle is common to all 
embryophytes. However, the life cycle of ferns and lycophytes is unique 
because the sporophyte and gametophyte generations are independent, and the 
sporophyte is generally considered the dominant phase (although gameto- 
phytes of certain species can be long-lived, e.g., Vittaria appalachiana (Farrar, 
1967)). This contrasts with bryophytes, which produce a sporophyte that is 
dependent on the dominant gametophyte, and seed plants in which 
gametophytes are highly reduced, encased within, and completely dependent 
on the dominant sporophyte. The transition between the haploid-dominant 
and sporophyte-dominant life cycle likely had a profound impact on plant 
evolution (Gerrienne and Gonez, 2011; Haufler et al., 2016; Qiu, Taylor and 
McManus, 2012). Ferns are therefore an ideal system to study the mechanisms 
that determine sporophyte and gametophyte development in comparison to 
other land plants. Land plant life cycles also vary in the types of spores they 
produce and the sexual condition of the gametophytes. For example, the 
Salviniales, Selaginellales, Isoétales, and all seed plants are heterosporous, 
meaning the sporophyte produces megaspores (in megasporangia) and 
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microspores (in microsporangia). These spores germinate and become 
gametophytes that are endosporic (i.e., they develop within the spore wall) 
and dioicous. The vast majority of ferns, on the other hand, are exosporic and 
homosporous, meaning the sporophyte produces spores that are all the same 
size and that have the potential to develop into monoicous gametophytes. 
Platyzoma microphyllum R.Br. (Pteridaceae) is a curious outlier to these two 
general life cycle formats—it is heterosporous, but exosporic and can produce 
monoicous gametophytes (Tryon and Vida, 1967; Duckett and Pang, 2014). The 
transition between homospory and heterospory has occurred independently 
many times across land plant evolution. Ferns are an excellent system to 
provide insight into the mechanisms that govern this transition by comparing 
the biology of the homosporous ferns to the heterosporous ferns (Salviniales) 
and the seed plants. Furthermore, across vascular plants, homospory is 
correlated with a higher chromosome numbers and larger genome sizes 
compared to heterospory. While the reason for this correlation is not fully 
understood (Wolf et al., 2015), it has been hypothesized that high levels of 
polyploidy could maintain genetic diversity in organisms with a high potential 
for inbreeding, linked with the production of monoicous gametophytes 
(Klekowski and Baker, 1966; but see Haufler and Soltis, 1986). Indeed, 
homosporous ferns have notoriously large genomes (average=13.82 Gb, 
range=1.95 - 73.19 Gb; Kuo & Li, 2019) and represent one of the final frontiers 
in exploring plant genome space. 

Ferns are also pivotal in studying the evolutionary development of vascular 
plant body plans. Stem-leaf-root anatomy has evolved in the sporophyte 
multiple times throughout the history of land plants (Boyce, 2005; Hether- 
ington and Dolan, 2018). To truly understand the mechanisms that govern the 
evolution of this anatomy, comparative genetic studies need to compare 
multiple lineages of land plants. For example, by focusing on a lycophyte 
Selaginella kraussiana (Kunze) A. Braun and a fern Osmunda regalis L., 
Harrison et al. (2005) were able to show that the same genetic pathway (in this 
case KNOX-AHP interaction) was independently recruited for the convergent 
evolution of leaves. Similar comparative studies in the future will provide not 
only new insights into the field of plant Evo-Devo, but will also reveal the 
genetic mechanisms that may have developed uniquely in ferns. 

Although ferns are essential to understanding land plant evolution through 
comparisons with other lineages, they are also interesting in their own right. 
The sex of many homosporous ferns can be influenced by their environment, a 
phenomenon unknown in other plant lineages. For example, the gametophytes 
of proposed model Ceratopteris richardii may be either male or hermaphro- 
ditic. Hermaphroditic gametophytes secrete a compound called antheridiogen, 
which promotes development of neighboring gametophytes as exclusively 
antheridial (Eberle et a/., 1995). Other biotic and abiotic factors, such as light 
(Kamachi et al., 2007) and soil bacteria (Ganger et al., 2019), can also influence 
gametophyte sex expression. In addition, no other plants are known to engage 
in a vertically transmitted cyanobacterial symbiosis like Azolla (Wagner, 
1997), making Azolla an excellent system to study symbiosis biology. And 
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finally, few plants can accumulate arsenic levels as high as Pteris vittata L. (Ma 
et al., 2001), making it an ideal system for studying the cellular and molecular 
mechanisms involved in heavy metal tolerance. 

Despite their utility for studying plant evolution, development, molecular 
biology, and ecology, a model fern has not yet been completely developed. A 
fully functional model requires a high-quality reference genome and efficient 
transformation methods. Additionally, they should not be polyploid, should 
be easily maintained in a laboratory environment, and have a relatively short 
generation time. Among ferns, only the genomes of Azolla filiculoides Lam. 
and Salvinia cucullata Roxb. have been sequenced (Li et al., 2018), but genetic 
manipulation tools are lacking in these species (Table 1). On the other hand, 
transformation techniques have been developed for species that lack a fully 
sequenced genome, including: Marsilea vestita Hook. & Grev., Adiantum 
capillus-veneris L., Pteris vittata, and Ceratopteris richardii Brongn. (Table 1; 
Bui et al., 2015; Kawai et al., 2003; Kawai-Tooyooka et al., 2004; Klink and 
Wolniak, 2000; Muthukumar et al., 2013; Plackett et al., 2014, 2015; Stout et 
al., 2003). In this review, we describe potential model ferns, as well as the tools 
and resources that have already been developed for each. 


AZOLLA FILICULOIDES & SALVINIA CUCULLATA 


Azolla and Salvinia are genera of floating aquatic ferns belonging to the 
Salviniacaeae. This family and its sister lineage Marsileaceae, are the only 
heterosporous ferns (Figure 2; PPG 1, 2016). The relationships of species 
within the genus Azolla require further resolution, but it likely contains five to 
seven species (Evrard and Van Hove, 2004; Metzgar, Schneider, and Pryer, 
2007; Reid, Plunkett, and Peters, 2006). Salvinia consists of around twelve 
species, and the position of S. cucullata, in particular, is uncertain 
(Nagalingum, Schneider and Pryer, 2008). 

Azolla is well known for housing an obligate, vertically transmitted 
nitrogen-fixing cyanobacterium (Nostoc azollae) within specialized leaf 
cavities. Because of this symbiosis, Azolla has been used for over 1,000 years 
to fertilize rice paddies in Southeastern Asia (Lumpkin and Plucknett, 1980). 
Additionally, due to its fast growth rate and high protein content, Azolla has 
been used as supplementary feed for poultry (Basak et al., 2002), fish (Abou et 
al., 2007), and livestock (Cherry] et al., 2013). Azolla has also been investigated 
as a means to treat wastewater contaminated with pollutants like arsenic (Leao 
et al., 2017), synthetic textile dyes (Kooh et al., 2016a, 2016b), swine waste 
(Muradov et al., 2014), fluoride (Zazouli et al., 2014), excess nitrogen and 
phosphorus (Forni et al., 2001), and zinc (Zhao et al., 1999). Finally, using 
Azolla as a biofuel has been pursued as extracted lipids are suitable for the 
synthesis of biodiesel and meet requirements on fuel density, cetane number, 
and iodine value (Brouwer et al., 2016; Salehzadeh, Maeemi and Arasteh, 
2014). 

Salvinia plants grow in large mats, lack roots, have two floating leaves and a 
highly dissected, submerged leaf, which bears clusters of sori (Nagalingum, 
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Heterosporous life cycle (Salviniales, seed plants, some lycophytes) 

Megaspore-———> _Megagametophyte ————-> Egg 

Sporophyte 

ia Microspore-+———_> Microgametophyte ————-> Sperm 
Homosporous life cycle (most ferns, Lycopodiaceae) 


Archegonia bearing 


yw gametophyte Piper 
at | 


7 | 


ae Y 
Sporophyte ————» Spore ------ » Monoicous gametophyte Ben 
Bd, ee Sperm 
_ arent bearing 
gametophyte 


Fic. 1. Comparing the general form of homosporous and heterosporous plant life cycles. Some 
lycophytes, all seed plants, and Salviniales (including the prospective models Azolla filiculoides, 
Salvinia cucullata, and Marsilea vestita) are heterosporous and produce endosoric, dioicous 
gametophtyes. The majority of the ferns (including Adiantum capillus-veneris, Pteris vittata, and 
Ceratopteris richardii) are homosporous and produce exosporic, potentially monoicous 
gametophytes. *Must be produced in megasporangia and microsporangia, respectively. Dashed 
lines represent possible paths of gametophyte development. 


Schneider, and Pryer, 2006; Nagalingum, Nowak, and Pryer, 2008). In contrast 
to Azolla’s reputation as a generally beneficial plant, Salvinia is most well 
known as a noxious weed. With the exception of some limited research into 
the phytoremediation capacity of Salvinia (Baral et al., 2009), much of the 
biological research concerning this genus has involved controlling its weedy 
tendency (Room et al., 1981; Room, 1990). In the materials sciences, Salvinia 
has garnered attention for its ability to retain air on its leaf surfaces (termed the 
“Salvinia Effect”) by means of a unique combination of hydrophilic patches on 
a highly hydrophobic surface (Barthlott et al., 2010). 

Genomic resources.—Azolla filiculoides and Salvinia cucullata both are 
diploid and have relatively small genomes (0.75 Gb in A. filiculoides and 0.26 
Gb in S. cucullata). They are the only two ferns for which genomes have been 
sequenced (Li et al., 2018). The assembled genomes have N50 contig sizes of 
964.7 Kb for A. filliculoides and 719.8 Kb for S. cucullata. A high level of 
completeness was indicated for both assemblies by BUSCO (Benchmarking 
Universal Single-Copy Orthologs) assessment and Illumina read-mapping 
results. Li et al. (2018) identified 20,201 high confidence gene models in A. 
filliculoides and 19,914 in S. cucullata that were supported by transcripts or 
were significantly similar to other known plant proteins. Additionally, 
medium-coverage resequencing was done on five other Azolla species, laying 
the foundation for future pan-genome and trait association studies. Genomic 
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Adiantum capillus-veneris 
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Ceratopteris richardii 
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Salvinia cucullata 
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Fic. 2. Phylogeny of fern families, highlighting prospective models. Photo credits: Pi-Fong Lu (A. 
capillus-veneris, P. vittata, and S. cucullata), Chi-Lien Cheng (C. richardii), Wikimedia Commons 


(M. vestita). 
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and transcriptomic data are available at FernBase (www.fernbase.org), along 
with genome browsers and BLAST utilities. 

Available tools and technologies.—Brouwer et al. (2014) described a series 
of methods for collecting A. filiculoides spores, spore germination and in vitro 
fertilization, and cryopreservation of fertilized megaspores with their N. 
azollae symbiont. Both A. filiculoides and S. cucullata are relatively easy to 
maintain by growing in water in a growth chamber or greenhouse. 

Missing tools.—Currently no transformation method has been developed for 
this lineage. Past studies have shown that it is feasible to generate protoplasts 
and callus tissues from several Azolla (Redford et al., 1987; Sini, Smitha, and 
Madhusoodanan, 2014) and Salvinia species (Nakamura and Maeda, 1994), 
which could be useful for future development of transformation methods. It 
should also be noted that sexual reproduction of S. cucullata is seldom 
observed in the lab environment, which is not ideal for a model system. 

Assessment as a model fern.—Both A. filiculoides and S. cucullata lack 
significant resources required of a model organism. Although neither A. 
filiculoides nor S. cucullata are particularly representative of the fern lineage 
as a whole, A. filiculoides would make an excellent model for studying plant- 
bacterial symbioses. 


MaARSILEA VESTITA 


Marsileaceae represents the other family of heterosporous ferns, sister to 
Salviniaceae (Figure 2). Marsilea, sometimes called the “water clover,” is a 
cosmopolitan genus of about 50 species (Whitten, Jacono, and Nagalingum, 
2012) of aquatic to amphibious perennial ferns, which spread by rhizomes that 
may be floating, creeping, or subterranean (Jacono and Johnson, 2006). One 
particular Marsilea species, M. vestita, has been the focus of diverse 
developmental studies. Yet, the exact taxonomic placement of M. vestita 
remains unclear (Whitten, Jacono, and Nagalingum, 2012). 

The leaves of Marsilea are unlike those of any other fern, consisting of four 
terminal leaflets in a cruciform arrangement borne on a petiole. It is the only 
group of ferns to exhibit true nyctinasty, or daily movement of leaf orientation 
(Minorsky, 2018). This leaf arrangement is in contrast to the closely related 
genera Regnellidium (with two leaflets) and Pilularia (with a highly reduced, 
filiform leaf) (Pryer and Hearn, 2009). The highly desiccation tolerant 
reproductive structures of Marsilea, called sporocarps, consist of a stalk 
(termed “peduncle,” “stipe,” or “pedicel”) and a sclerified wall surrounding 
bisporangiate sori (Nagalingum, Schneider and Pryer, 2006, 2007). 

Marsilea vestita has been used as a model to study its uniquely rapid process 
of spermatogenesis (Wolniak et al., 2011). Upon rehydration of a microspore, a 
microgametophyte develops endosporically and spermatogenesis completes 
within 12 hours, releasing 32 highly flagellated sperms. The pioneering work 
by Stephen Wolniak and colleagues has shown that the rapid development of 
microgametophytes relies on translating stored RNA from microspores, and 
there is little or no de novo gene transcription (Boothby et al., 2013). At 
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different stages of microgametophyte development, specific pools of stored 
pre-mRNAs are spliced to remove introns and enable translation. Through 
RNAi silencing, it was further demonstrated that such stage-specific mRNA 
maturation is required for proper gametophyte development (Boothby et al., 
2013; Wolniak et al., 2011). A similar process also likely underlies 
megagametophyte development in M. vestita (Kuligowski, Ferrand, and 
Chenou, 1991), but few follow-up studies have been done. 

Available tools and technologies.—Marsilea vestita can be easily grown in a 
container of water in a growth chamber or greenhouse environment. The 
history of research on M. vestita has resulted in numerous tools that will be 
useful to its development as a model system. A number of methods have been 
developed to localize mRNA transcripts during microgametophyte develop- 
ment in M. vestita (Boothby and Wolniak, 2011; Deeb et al., 2010; Kuligowski, 
Ferrand, and Chenou, 1991; Tsai, Van Der Weele, and Wolniak, 2004; Van Der 
Weele, Tsai, and Wolniak, 2007). Most importantly, M. vestita was the first fern 
in which RNAi was applied to manipulate gene expression (Klink and 
Wolniak, 2000, 2001, 2003; Tsai and Wolniak, 2001). This was done by directly 
incubating microspores in the double-stranded RNA solution. It is, however, 
unclear if the same approach can be readily applied in other tissue types, and 
whether the silencing effect can be passed beyond fertilization to the 
sporophyte generation. 

Missing tools.—Although there are several RNA-sequencing datasets 
published (Boothby et al., 2013; Tomei and Wolniak, 2016), no reference 
genome exists for M. vestita. Because the smallest Marsilea genome reported to 
date is only 1.34Gb (Li et al., 2018; Kuo and Li, 2019), sequencing and 
assembling the M. vestita genome would likely be easier than a large 
homosporous fern genome. It should be noted that the generation time (from 
spore to spore) of M. vestita is unclear, as most of the past research focused on 
spermatogenesis that does not require the completion of life cycle. 

Assessment as a model fern.—Lacking genomic data is not a significant 
hurdle in the effort to make M. vestita a model fern. Given the interest in 
addressing fundamental questions of life cycle evolution using ferns, a model 
heterosporous fern would be highly desirable, even if it is not representative of 
the entire fern lineage. The numerous studies conducted on M. vestita 
spermatogenesis make it a good system for studying mechanisms of cell 
biology, as well as a promising candidate to become a model heterosporous 
fern. 


ADIANTUM CAPILLUS-VENERIS 


Adiantum is a cosmopolitan, homosporous genus of 225 species in the 
Pteridaceae, nested within the leptosporangiate ferns (Figure 2; Huiet et al., 
2018; PPG 1, 2016). Molecular analyses revealed that Adiantum is sister to the 
vittarioids (shoestring ferns), despite stark morphological and ecological 
differences (Pryer et al., 2016; Schuettpelz and Pryer, 2007). The sporophytes 
of Adiantum are terrestrial, shade-loving, and bear compound leaves which, 
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when fertile, produce sporangia only on false indusia. Adiantum gameto- 
phytes are determinate and cordate, with a distinct midrib and broad wings. 
Vittarioid sporophytes, on the other hand, are epiphytic and have simple, 
strap-shaped leaves with sori borne on the lamina. Vittarioid gametophytes are 
indeterminate, strap shaped, exceptionally long-lived, and can reproduce 
asexually via gemmae. Additionally, it is likely that at least one genome 
duplication occurred in the vittarioids after splitting from Adiantum (Pryer et 
al., 2016). The stark morphological, ecological, and genomic differences that 
exist between these sister lineages make them ideal for comparative studies to 
elucidate the genetic mechanisms that control traits like leaf shape and 
determinate versus indeterminate growth of gametophytes. 

Adiantum, in particular A. capillus-veneris, is perhaps most notable because 
it has served as a model to study photobiology. Masamitsu Wada and 
colleagues have used A. capillus-veneris to make several breakthroughs in 
phototropism, polarotropism, and chloroplast movement (Doi and Shimazaki, 
2008; Imaizumi, 2000; Doi, Wada, and Shimazaki, 2006; Tsuboi et al., 2012). 
Several photoreceptors have also been functionally characterized in detail, 
including neochrome, a phytochrome-phototropin chimeric receptor (Li and 
Mathews, 2016; Wada, 2013). In addition, the first complete fern plastome was 
generated from A. capillus-veneris (Wolf et al., 2003), which was used to study 
levels of RNA-editing in the chloroplast genome (Wolf, Rowe, and Hasebe, 
2004). 

Available tools and technologies.—It is possible to manipulate gene 
expression in A. capillus-veneris via both gene overexpression and silencing. 
Genetic transformation by particle bombardment has been developed for 
gametophyte tissues and has been applied to rescue photoreceptor mutants 
(Kawai et al., 2003), as well as to localize gene expression in conjunction with 
GUS (Tsuboi et al., 2012). In addition, Kawai-Toyooka et al. (2004) showed 
that gene silencing can be achieved by bombarding gametophytes with double- 
stranded DNA instead of a traditional RNAi approach. This DNAi method can 
produce up to 90% gene silencing efficiencies. 

Missing tools.—While an EST library from gametophytes (Yamauchi et al., 
2005) and a leaf transcriptome (Qi et al., 2018) have been published, no whole 
genome sequence has yet been generated for A. capillus-veneris. The genome 
size is unknown, although likely to be around 4—6Gb based on the estimates 
from congeneric species (Bainard et al., 2011; Kuo and Li, 2019). 

Assessment as a model fern.—In terms of morphology, reproductive 
strategy, and habit, A. capillus-veneris is a good representative of the fern 
lineage. It has been a useful system to study fern photobiology, and could 
provide insights into fundamental evo-devo questions, especially in compar- 
ison with the vittarioids. 


PTERIS VITTATA 


Pteris is also a homosporous, leptosporangiate fern in the Pteridaceae 
(Figure 2; PPG1, 2016). The genus has now been recovered as monophyletic 
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and represents one of the largest fern genera, containing three subgenera and 
200-250 species distributed globally (Zhang and Zhang, 2018). This diversity 
is reflected in the breadth of anatomy and habitat in which Pteris species are 
found. 

In 2001, Pteris vittata was discovered to hyperaccumulate arsenic (as high as 
1% dry weight; Ma et al., 2001). Though some other species in Pteris have also 
been found to possess this ability, P. vittata is the most efficient (Luongo and 
Ma, 2005). Because of its potential for phytoremediation applications, much of 
the research conducted on P. vittata has focused on elucidating the mechanism 
of arsenic accumulation (Cesaro et al., 2015; Datta et al., 2017; Gu et al., 2018; 
reviewed in Xie et al., 2009). Pteris vittata gametophytes, in particular, were 
proposed as a model system for analyzing arsenic hyperaccumulation because 
the rapid growth rate, small size, ease of culture, and haploid genome are more 
conducive to research than the sporophytes (Gumaelius et a/., 2004). A number 
of transporters from P. vittata have been identified that mediate arsenite uptake 
(DiTusa et al., 2016; He et al., 2016; Indriolo et a/., 2010). Moreover, expression 
of a P. vittata arsenite antiporter ACR3;1 was able to reduce arsenic 
accumulation in shoots of Arabidopsis and tobacco, demonstrating a potential 
application in crops (Chen et al., 2017). 

Available tools and technologies.—To test the function of ACR3 in P. vittata, 
a gene knock-down method by RNAi was developed, in which the RNAi 
constructs were biolistically bombarded into gametophyte tissues (Indriolo et 
al., 2010). Muthukumar et al. (2013) later showed that stable transformation 
can be achieved by co-incubation of spores with Agrobacterium tumefaciens. 
The transformation efficiency was reported to be around 0.05% (Muthukumar 
et al., 2013). 

Missing tools.—There is not yet a genome sequence for P. vittata, nor is there 
any published information on its genome size. Additionally, it should be noted 
that P. vittata is predominantly a tetraploid, and diploid individuals have a 
restricted geographical distribution (Srivastava, Ranade, and Khare, 2007). 
Furthermore, there is a mixture of reproductive strategies in this species, 
including sexual and apomictic, along with a range of ploidy levels (Chao et 
al., 2012). The identification and collection of a sexual diploid strain would be 
vital to the future development of P. vittata as a model. Finally, there is no 
published guideline or manual on how to grow and maintain P. vittata in the 
lab. 

Assessment as a model fern.—P. vittata lacks two of the most fundamental 
aspects of a model organism: genomic data and diploid representatives. While 
its development as a model organism would provide unique insight into 
arsenic tolerance and accumulation, significant resources would have to be 
dedicated to filling these gaps. 


CERATOPTERIS RICHARDI 


Ceratopteris is also a homosporous, leptosporangiate fern in the Pteridaceae, 
though the taxonomy within the genus remains somewhat unsettled (March- 
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ant, 2019). Polyploids and interspecific hybrids appear to be common in this 
genus (Lloyd, 1974), and further systematic work is needed, especially if 
members of this genus are to be used as model organisms. 

Ceratopteris has been dubbed “the Arabidopsis of the fern world” (Sessa et 
al., 2014). It has emerged as the most promising fern model species, largely 
because it can be readily cultured and transformed in the lab. Ceratopteris 
richardii is a diploid (n=39) aquatic fern that has been identified from Africa, 
Southeastern Asia, Japan, Australia, Fiji, and the Hawaiian Islands, although 
its highest concentration is in the Americas. It grows as an annual with a short, 
upright rhizome and slender, dimorphic leaves and divergent branches. Its 
short generation time (the life cycle can be completed in about 120 days) and 
small size make it a convenient lab model. Mature sporophytes are only about 
5cm tall and spread over a diameter of less than 3cm, and gametophytes are 
even smaller. Thus, large numbers of plants can be cultured in a small growth 
chamber or greenhouse, enabling mutant screens. Plants can be vegetatively 
propagated easily from buds found on senescing fronds. Additionally, 
protoplasts can be isolated from gametophytes, and can regenerate to produce 
cultures (Edwards and Roux, 1998). A single mature sporophyte can produce 
around one million spores (per plant), which can be stored and remain viable 
for years (Chatterjee and Roux, 2000). Furthermore, a fast-growing cultivar 
strain of C. richardii, marketed as the “C-fern,” has been used as an educational 
model for K-12 and undergraduate instruction (Renzaglia and Warne, 1995). 
The “C-fern express” (also marketed for classroom use), on the other hand, is 
C. thalictroides (L.) Brongn., a tetraploid species (Hickok, Warne, and 
Fribourg, 1995). 

Ceratopteris richardii is capable of intra-gametophytic self fertilization, 
resulting in completely homozygous sporophyte offspring (Haufler et al., 
2016). At least three homozygous strains have been established (Hickok, 
Warne, and Fribourg, 1995). The most commonly used strain is Hn-n, which 
came from a spore on an herbarium specimen collected in Cuba. Two other 
strains are D176 from Guyana and PhiN8 from Nicaragua; they differ in leaf 
morphology, spore number per sporangium, as well as antheridiogen response 
(Hickok, Warne, and Fribourg., 1995; McGrath et al., 1994). All C. richardii 
strains can be crossed with each other and yield fertile F1 progeny (Hickok, 
Warne, and Fribourg, 1995). 

Ceratopteris richardii has enabled research on a wide variety of topics, 
including gametophyte development (Banks, 1999), cell wall development 
(Leroux et al., 2013), evo-devo (e.g., Hasebe et al., 1998; Sano et al., 2005; 
Plackett et al., 2018), and plant responses to gravity in space flight (Bushart et 
al., 2013; Edwards and Roux, 1998; Salmi and Roux, 2008; Salmi et al., 2011). 
Ceratopteris richardii gametophytes in particular have served as a model for 
elucidating mechanisms of sex determination in ferns. Several sex determi- 
nation mutants have been identified (Banks, 1994; Banks, 1997a, 1997b; Eberle 
et al., 1995; Strain, Hass, and Banks, 2001), and studies have been conducted 
on the hormones involved in sex determination (Atallah and Banks, 2015) as 
well as the effects of light (Kamachi et al., 2004, 2007; Murata and Sugai, 2000; 
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- Spiro, Torabi, and Cornell, 2004) and the soil microbiome (Ganger et al., 2019) 
on gametophyte development and sex determination. 

Ceratopteris richardii spores are the largest recorded within the homospo- 
rous ferns (~70—150 um in diameter), making them useful for electrophys- 
iological studies on signal transduction pathways, which could be 
complicated by multicellularity (Chatterjee and Roux, 2000; Chatterjee et al., 
2000). The development of genetically identical spores is easy to synchronize, 
which allows for study and manipulation of the direction of polarity and cell 
level gravity responses within the first 24 hours of gametophyte development 
(Chaterjee and Roux, 2000). In 1999, C. richardii became the first ever “space 
fern,” on board Space Shuttle Columbia as a part of the STS-93 shuttle mission 
(Roux et al., 2003). It was found that in the spaceflight environment, around 
5% of the transcripts in the C. richardii cDNA microarray were differentially 
regulated (Salmi and Roux, 2008). 

Two recent studies fully explore C. richardii’s potential as a model 
organism. Bui et al. (2017) investigated the genetic basis of fern apogamy—a 
type of asexual reproduction where sporophytes develop directly from 
gametophyte somatic cells without fertilization. Using a combination of 
transcriptome-sequencing, in situ hybridization, gene overexpression, and 
RNAi knockdown, it was demonstrated that C. richardii AINTEGUMENTA is 
required for promoting apogamous sporophytes (Bui et al., 2017). The second 
study focused on testing the function of LEAFY homologs in C. richardii 
(Plackett et al., 2018). While LEAFY in flowering plants is a key transcription 
factor that marks floral meristems, its role in ferns has been unknown. Using a 
similar suite of tools to Bui et al. (2017), Plackett et al. (2018) showed that C. 
richardii LEAFY plays an important role in maintaining apical stem cells 
throughout both sporophyte and gametophyte development. 

Available tools and technologies.—Ceratopteris richardii is by far the most 
genetically tractable fern system (Table 1). Early attempts to manipulate gene 
expression involved RNAi and DNA vector-based gene silencing (Rutherford et 
al., 2004; Stout et al., 2003). More recently, stable transformation has been 
achieved for C. richardii and C. thalictroides (“C-fern express”) using 
Agrobacterium-mediated transformation (Bui et al., 2015; Muthukumar et al., 
2013) and particle bombardment (Plackett et al., 2014; Plackett, Rabbino- 
witsch, and Langdale, 2015). The efficiency can be as high as 87% for transient 
transformants and 2.6% for stable transformants (Bui et al., 2015). In addition, 
a genetic linkage map has been produced for C. richardii using 488 doubled 
haploid lines that were genotyped for 368 restriction fragment length 
polymorphisms, 358 amplified fragment length polymorphisms, and three 
isozyme markers (Nakazato et al., 2006). This has been used to conduct 
quantitative trait locus (QTL) analysis to study reproductive barriers in C. 
richardii (Nakazato et al., 2007). 

Missing tools.—The 11.25 Gb genome of C. richardii has yet to be fully 
sequenced (Marchant, 2019). Low coverage (1.73X) genome skimming data 
have been published (Wolf et al., 2015) and the publication of a partially 
assembled genome is imminent (Marchant et al., 2019). 
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Assessment as a model fern.—A multitude of useful techniques have been 
developed for C. richardii, and it serves as the most prominent and promising 
prospective model fern at this point. 


CONCLUSIONS AND OUTLOOK 


All of the species reviewed here have the potential to become a model fern, 
but they all currently lack significant components of a complete model system. 
A genome has only been fully sequenced for Azolla filiculoides and Salvinia 
cucullata, yet no transformation methods have been developed for them. The 
closely related Marsilea vestita, on the other hand, lacks a genome, but 
extensive research has been conducted to develop RNAi methods for this 
species. If a genome were sequenced for M. vestita, which is likely to be one of 
the smallest genomes in ferns (Li et al., 2018; Kuo and Li, 2019), much of the 
foundational work has already been completed to make it a promising model 
heterosporous fern. 

Adiantum capillus-veneris has served as a model to study fern photobiology 
and some transformation methods have been developed; however, the genome 
has also not been sequenced. Similarly, Pteris vittata has garnered attention for 
its potential in the phytoremediation of arsenic. Though an Agrobacterium- 
mediated transformation method has been developed for P. vittata, the lack of 
a genome sequence and diploid strain hinder its emergence as a model. 

The most developed model fern remains Ceratopteris richardii (also see 
Marchant, 2019). It has served as a model fern to tackle the genetic basis of sex 
determination and apogamy, as well as to elucidate the major transitions in 
plant evolutionary development. Efficient transformation methods via Agro- 
bacterium and particle bombardment have been developed, and as exemplified 
by two recent studies by Bui et al, (2017) and Plackett et al. (2018), such 
genetic tools are powerful to link genes to specific biological functions. 
Importantly, the robust transformation methods also pave the way for 
developing CRISPR/Cas9 gene editing capacity, which has yet to be achieved 
in ferns (Table 1). Nevertheless, the 11.25 Gb genome, though typical of ferns, 
is limiting the full potential of C. richardii as a model organism. Fortunately, 
with the rapid advancement of sequencing technologies, we believe a complete 
C. richardii genome should be delivered in the near future. 

It would, of course, be ideal to establish more than one model organism for 
ferns. Marsilea vesitita is diploid, easy to grow in the lab, has not only a small 
genome that would be easy to sequence, but also numerous developed 
transformation techniques. Though the generation time is unclear, we envision 
that M. vestita would be the next most readily developed model and would 
nicely complement the homosporous C. richardii to serve as a heterosporous 
fern model. While using M. vestita and C. richardii as models will certainly not 
represent the whole of fern diversity, uniting resources behind these two 


models will enable plant biologists to study both ferns and the greater trends in 
land plant evolution. 
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ApstTrAc’.—The large genomes of ferns have long deterred genome sequencing efforts. To date, only 
two heterosporous ferns with remarkably small genomes, Azolla filiculoides and Salvinia 
cucullata, have been sequenced. However, as sequencing technologies continue to improve and 
become more affordable, generating high-quality, “normal-sized” fern genomes is within reach. 
Here we provide new genome size data and discuss candidates for whole genome sequencing. In 
particular, we identified 18 species representing major branches in the fern phylogeny that are 
worth pursuing. We also review the current sequencing technologies and offer our opinions on the 
best sequencing approach for these fern species. 


Key Worns.—evolutionary genomics, flow cytometry, genome size, phylogeny 


The past decade has witnessed a tremendous “genomic bloom,” with over 
three hundred plant genomes published (Fig. 1; https://plabipd.de/portal/ 
sequenced-plant-genomes). This is largely due to the rapid advancement of 
sequencing technologies; for example, the genome of Arabidopsis thaliana (L.) 
Heynh. that once cost US $100 million can now be obtained for less than 
$1,000 on a portable Nanopore MinION sequencer (Michael et al., 2018). 

However, the vast majority of the available plant genomes are from seed 
plants (Fig. 1), and only seven seed-free plant genomes have been sequenced. 
While such an imbalance had been predicted and cautioned by Pryer et al. 
(2002) in the early Sanger-era of genome sequencing, a large gap in the seed- 
free lineages still persists today. The lack of genomic information from all 
branches of the plant tree of life has not only limited research on the major 
transitions in plant evolution, but also hindered investigations into the biology 
of seed-free plants. 

Ferns are one of the final frontiers in the exploration of plant genome space. 
Although fern genomes have long been regarded as too prohibitively large for 
sequencing, there are remarkable exceptions. Obermayer et al. (2002) reported 
a genome size of 0.75 gigabase (Gb) for Azolla in the heterosporous order 
Salviniales, which is significantly smaller than those of homosporous ferns. In 
our recent survey of genome sizes in Salviniales, we not only were able to 
confirm the Azolla genome size, but also discovered the Salvinia cucullata 
Roxb. genome to be merely 0.26 Gb, a size only moderately larger than A. 
thaliana. We recently completed the genomes of Azolla filiculoides Lam. and 
S. cucullata (Li et al., 2018), which offered the first insights into fern genome 
space and evolution. It should be noted, however, that both of the sequenced 


* 


corresponding author: f1329@cornell.edu 


KUO & LI: FERN GENOME SEQUENCING 213 


No. published genomes 


Seed plants ie ‘w 
Ferns ; 
Lycophytes |3 
Mosses 


Liverworts |1 


Hornworts' |o 


Fic. 1. The current landscape of published plant genomes. 


species belong to Salviniaceae and it is unclear whether these small-sized 
genomes are representative of ferns as a whole. In other words, we need more 
fern genomes. 

The goal of this paper is to identify candidates for future genome 
sequencing. We generated 35 new C-value estimates using flow cytometry 
and compiled the most up-to-date genome size data for ferns. We identified 18 
species representing major branches in the fern phylogeny that are worth 
pursuing initially. 


MATERIALS AND METHODS 


Sampling.—Fresh leaf and spore materials of 35 species (covering 15 
families and 20 genera) were collected from the field, the Dr. Cecilia Koo 
Botanic Conservation Center, or the Taipei Botanical Garden. This taxon 
sampling focused on the families and genera that have been reported to have 
relatively small genome sizes (Clark et al., 2016), or have little C-value 
information. Voucher information is provided in Table 1. 

Flow cytometry.—We used flow cytometry to estimate genome sizes 
following the protocol of Kuo et al. (2017) and Kuo and Huang (2017). Various 
chopping buffers were used, including the Beckman, LB01, and GPB buffers. 
All of them have 0.5% (v/v) 2-mercaptoethanol, and RNaseA 0.1 mg/ml. 4% 
PVP-40 was used for Beckman and LBO1, and 3% for GPB. Other details of 
buffer preparation and storage condition can be found in Dolezel, Binarova, 
and Lucretti (1989), Ebihara et al. (2005), and Loureiro et al. (2007). For the 
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internal standard, we used Pisum sativum cv. “Ctirad”, Nicotiana tabacum cv. 
“Xanthi”, Secale cereale cv. “Dankovske”, Vicia faba cv. “Inovec”, or Zea mays 
“CE-777”; their genome sizes were reported by Dolezel, Sgorbati, and Lucretti 
(1992) and DoleZel et al. (1998), Lysak and Dolezel (1998), and Johnston et al. 
(1999). Samples were run on the BD FACSCan system (BD Biosciences, 
Franklin Lake, New Jersey). For both the sample and internal standard, we 
aimed to have a coefficient of variation (CV) <5% and >1,000 particles 
collected for each nuclei peak. Three replicates (six for Cibotium taiwanense 
C.M. Kuo) per sample were carried out, and the average estimates are reported 
here. 


RESULTS AND DISCUSSION 


Summary of flow cytometric analyses.—The results of our flow cytometric 
analyses are summarized in Table 1. All the nuclei peaks met our quality (CV) 
and quantity (number of particles) criteria, excepting for Diplopterygium 
glaucum (Thunb. ex Houtt.) Nakai and D. chinensis (Rosenst.) DeVol, which 
have a slightly wider average CV: 5.06 and 5.03%, respectively (Table 1). 

Distribution of fern genome size.—Fern genome size (1C) ranges significantly 
from 0.26 Gb in Salvinia cucullata to 73.19 Gb in Tmesipteris elongata 
Dangeard (Fig. 2, 3). This translates into a striking 282-fold difference. Of the 
48 fern families recognized in PPG I, 38 have genome size data. When 
considering only the minimum 1C value within each family, the median and 
mean estimates are 6.3 Gb and 9.4 Gb respectively (Fig. 2). Salviniales 
(heterosporous) have the smallest genomes—from 0.26 Gb in S. cucullata 
(Salviniaceae) to 1.4 Gb in Marsilea minuta L. (Marsileaceae). Among the 
homosporous orders, Gleicheniales stood out in particular with only 1.95 Gb 
in Diplopterygium chinensis (Gleicheniaceae) and 2.4 Gb in Dipteris conjugata 
Reinw. (Dipteridaceae; Clark et al., 2016). 

Ferns to sequence.—To select candidates for future genome sequencing, we 
focused on species that (1) occupy distinct branches in the fern phylogeny and 
(2) have genomes smaller than 6.5 Gb, which we believe with the current 
technologies is the upper limit for a genome to be realistically sequenced and 
assembled with less than $24,000. The only exception is Ceratopteris richardii 
Brongn., whose genome size is 11.25 Gb (Wolf et al., 2015); this is because C. 
richardii is genetically trackable and has been used as a powerful model 
species (see Marchant, 2019 and Petlewski and Li, 2019 in this issue). A total 
of 18 fern species was identified (Fig. 3 and Table 2), together covering 15 
families and 7 orders (Fig. 3). 

Table 2 also lists our cost estimates for sequencing each of these genomes. 
Currently the best sequencing approach for de novo assembly is by using long- 
read technologies such as PacBio and nanopore (Li and Harkess, 2018). 
However, achieving high enough long-read coverage (usually 50X) can be very 
costly for large genomes, and one alternative is to adopt a hybrid assembly 
strategy that utilizes a lower long-read coverage (20X) plus ~50X Illumina 
short reads. While being more economical (Table 2), this hybrid approach 
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Count 
Aw 


Genome size (Gb) 


Fic. 2. The distribution of fern genome sizes at the family level. For each family only the lowest 
estimate was included. 


typically results in a less continuous assembly than that based solely on long 
reads. 

Our calculations of the sequencing cost were based on various online quotes 
and our own experience. For long reads, nanopore is probably less expensive 
than PacBio on a per Gb basis; we estimated $68.75 per Gb for nanopore, given 
a 12Gb output per flowcell and each flowcell and the associated reagents 
costing $825 (the cost per flowcell depends on the order size; it can be as low 
as $500). For Hlumina, we estimated $26.5 per Gb, given that one lane of 
HiSeq4000 generates roughly 100Gb of data, and each lane plus library prep 
costs around $2,650. 

Another possible sequencing strategy for large genomes is using 10X 
Genomics to generate long linked-reads (Li and Harkess, 2018). While this 
technology has been very successful in producing high-quality genomes for 
several animal species, it has not been widely tested in plants. Nevertheless, 
because of the low cost (roughly $3,000 for a 1 Gb genome), 10X Genomics 


could be a technology to consider in the next five years for fern genome 
sequencing. 
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Fic. 3. Phylogenetic placement of candidate species for future whole genome sequencing efforts 
and their genome sizes (orange bars). Only the lowest size estimate was included for each fern 


family. 
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Additional considerations.—Although we identified 18 candidates based on 
genome size and phylogenetic position, future work still needs to carefully 
assess the availability of plant materials, feasibility to obtain high quality DNA, 
and degree of heterozygosity. Genome sequencing, especially in the case of 
long reads, requires a large quantity (>20 ug) of high molecular weight DNA 
(>50 kb average length). This is not a trivial task and optimization is likely 
needed for each species. Therefore, it would be ideal to have the target plant 
stably growing in a botanical garden, greenhouse, or growth chamber, so that 
repeated DNA extractions can be obtained from the same plant. Furthermore, 
transcriptomes from various organs (e.g., roots, rhizomes, and leaves) are 
fundamental for annotating genes in a genome assembly, which again requires 
a living plant. 

A highly heterozygous genome would be much harder to assemble than a 
homozygous one. Heterozygosity can be assessed by low-coverage Illumina 
sequencing (20-40X) and should be done before committing the all-in, 
expensive long-read sequencing. Many fern species are capable of gameto- 
phytic selfing (Haufler et al., 2016), which results in completely homozygous 
offspring. Generating such a homozygous individual might save a considerable 
amount of sequencing cost and make downstream bioinformatic analyses 
significantly easier. 

It should also be noted that the currently available genome size data are not 
evenly distributed, and certain species/genera/families have disproportionally 
more estimates than others. Future work should also focus on the families that 
lack any genome size data and those that have very few estimates. 

Summary.—With the rapid improvement of sequencing technologies and 
the plummeting cost, it is becoming increasingly easy to sequence plant 
genomes. However, careful planning is absolutely required to decide which 
species to sequence and how to sequence. In this paper, we compiled the most 
up-to-date fern genome size data, and identified 18 species with medium-sized 
genomes that span the fern tree of life. We anticipate this list of candidates will 
provide a clear roadmap going forward not only to better understand the fern 
genome space, but also to tackle long-standing questions on polyploidy and 
chromosome evolution in ferns. 
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Asstract.—Allopolyploidization is a common mode of speciation in ferns with many taxa having 
formed recurrently from distinct hybridization events between the same parent species. Each 
hybridization event marks the union of divergent parental gene copies, or homeologs, and the 
formation of an independently derived lineage. Little is known about the effects of recurrent 
origins on the genomic composition and phenotypic variation of allopolyploid fern taxa. To begin 
to address this knowledge gap, we investigated gene expression patterns in two naturally formed, 
independently derived lineages of the allotetraploid fern Polypodium hesperium relative to its 
diploid progenitor species, Polypodium amorphum and Polypodium glycyrrhiza. Using RNA- 
sequencing to survey total gene expression levels for 19194 genes and homeolog-specific 
expression for 1073 genes, we found that, in general, gene expression in both lineages of P. 
hesperium was biased toward P. amorphum—both by mirroring expression levels of P. amorphum 
and preferentially expressing homeologs derived from P. amorphum. However, we recovered 
substantial expression variation between the two lineages at the level of individual genes and 
among individual specimens. Our results align with similar transcriptome profile studies of 
angiosperms, suggesting that expression in many allopolyploid plants reflects the dominance of a 
specific parental subgenome, but that recurrent origins impart substantial expression, or 
phenotypic, variation to allopolyploid taxa. 


Key Worps.—differential expression; independently derived lineages; polyploid; reticulate 
evolution; RNA-Seq 


Once considered an evolutionary dead end, polyploidy is now widely 
accepted as an evolutionary force with the potential to influence the fate of 
entire lineages, from plants to fungi to animals, and even protozoans (Stebbins, 
1950; Wolfe and Shields, 1997; Gallardo et al., 1999; Sémon and Wolfe, 2007). 
This is particularly true among vascular plants, where most lineages have 
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experienced one or more ancient polyploidization events (Clark and 
Donoghue, 2018). An increase in ploidy has co-occurred with an estimated 
15% and 31% of speciation events in angiosperms and ferns, respectively 
(Wood et al., 2009). Such a high prevalence of polyploids suggests that genome 
duplication may be a primary facilitator of evolutionary change, allowing these 
organisms to adapt to, and persist in, varying environmental conditions (Soltis 
and Soltis, 2000; Comai, 2005; Otto, 2007; Meimberg et al., 2009; Ramsey, 
2011; Madlung, 2013). 

Allopolyploids are especially intriguing because they result from interspe- 
cific hybridization accompanied by chromosome doubling, and they begin 
their evolutionary journey with full sets of chromosomes from each progenitor. 
The union of divergent, or homeologous, genomes is often associated with 
major genetic and regulatory changes, such as epigenetic modifications 
(Salmon, Ainouche, and Wendel, 2005; Lukens et al., 2006), activation of 
transposable elements (Kashkush, Feldman, and Levy, 2002; Senerchia, 
Felber, and Parisod, 2014), chromosomal rearrangements (Levy and Feldman, 
2004; Chester et al/., 2012), homeologous recombination (Gaeta and Pires, 2009; 
Salmon et al., 2010), and gene loss (Thomas, Pederson, and Freeling, 2006; 
Buggs et al., 2009; Moghe et al., 2014). These genomic changes, in turn, can 
result in altered gene expression patterns and phenotypes (Comai, 2005; Coate 
et al., 2012; Madlung, 2013; Wendel et al., 2018). 

While patterns of gene expression in allopolyploids can vary significantly 
among genes and tissues, as well as among individuals and species, two 
transcriptional phenomena have been widely observed in microarray and 
RNA-sequencing experiments. The first phenomenon, referred to as expression 
level dominance, occurs when the gene expression of an allopolyploid for a 
given gene is statistically identical to the expression level of only one of its 
parent species (e.g., Rapp, Udall, and Wendel, 2009; Flagel and Wendel, 2010; 
Bardil et al., 2011; Yoo, Szadkowski, and Wendel, 2013; Cox et al., 2014). The 
second phenomenon, referred to as homeolog expression bias, is the 
preferential expression of homeologs, or gene copies, from one parent for a 
given gene in an allopolyploid (e.g., Chaudhary et al., 2009; Koh, Soltis, and 
Soltis, 2010; Grover et al., 2012; Combes et al., 2013; Yoo, Szadkowski, and 
Wendel, 2013; Akama et al., 2014; Cox et al., 2014; Edger et al., 2017). Patterns 
of expression level dominance and homeolog expression bias across the 
transcriptome can mirror or contradict the expression of individual genes, 
often with both phenomena co-occurring. For example, Yoo, Szadkowski, and 
Wendel (2013) found that 25% of genes surveyed in the natural allotetraploid 
Gossypium hirsutum exhibited expression level dominance in the direction of 
its G. arboretum (A-genome) parent. Most of the genes that exhibited A- 
genome dominance, however, exhibited near equal expression of homeologs 
derived from both the G. arboretum (A-genome) parent and the second 
parental species, G. raimondii (D-genome). In general, the interplay of 
expression level dominance and homeolog expression bias in an allopolyploid 
is thought to reflect a divergence in the number and distribution of 
transposable elements (TEs) and cis/trans regulatory elements between 
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parental subgenomes, ultimately mediating the process of genome fraction- 
ation, or loss of duplicated gene copies (Steige and Slotte, 2016; Bottani et al., 
2018; Edger et al., 2018; Hu and Wendel, 2018; Wendel et al., 2018). 

Increasingly, it is recognized that many, if not most, allopolyploid species 
form recurrently from distinct hybridization events between the same 
progenitor species (Soltis and Soltis, 1999) and comprise multiple, indepen- 
dently derived lineages (IDLs). In some cases, allopolyploid species form 
reciprocally, with IDLs having different maternal progenitor species and, 
consequently, different maternally-inherited plastid genomes (most plants 
have maternal inheritance of plastids; Reboud and Zeyl, 1994; Birky, 1995). At 
the time of formation, IDLs contribute to the genetic diversity of an 
allopolyploid species by incorporating different parental genotypes (Soltis 
and Soltis, 1999), but the influence of recurrent origins on genetic and 
phenotypic variation in subsequent generations is less apparent. Studies on a 
handful of natural, recurrently formed allopolyploid angiosperms, for 
example, Tragopogon miscellus and T. mirus, suggest that IDLs have 
undergone similar patterns of homeolog evolution, resulting in similar 
patterns of gene expression (Tate et al., 2006, 2009; Koh, Soltis, and Soltis, 
2010; Buggs et al., 2009, 2010). For example, plants belonging to IDLs of T. 
miscellus, exhibit strong homeolog-expression bias, with preferential loss of 
expression of homeologs derived from their T. dubius parent (in many cases 
due to convergent homeolog loss; Tate et al., 2006, 2009; Buggs et al., 2010). 
Additional studies of recurrently formed allopolyploids, preferably with broad 
taxonomic representation, are needed to understand the predictability of 
homeolog evolution and expression. 

As the second largest lineage of vascular plants, ferns have a complex 
history of polyploidization events and are replete with reticulate species 
complexes (Haufler and Soltis, 1986; Barrington, Haufler, and Werth, 1989; 
Barker and Wolf, 2010; Sigel, 2016). Polypodium hesperium Maxon is a 
recurrently formed allotetraploid taxon and member of the well-studied 
Polypodium vulgare complex (Sigel, Windham, and Pryer, 2014). Native to 
western North America, P. hesperium is known to have formed reciprocally 
between two diploid species, P. amorphum Suksd. and P. glycyrrhiza D.C. 
Eaton, that belong to genetically and morphologically distinct clades (diverged 
approximately 12 mya, Sigel et al., 2014; average genetic divergence of 2%; 
Table $1). Polypodium hesperium comprises at least two IDLs with different 
maternally inherited plastids—populations north of 42°N have plastids 
inherited from P. amorphum (henceforth P. hesperium H*), whereas those 
south of 42°N have plastids inherited from P. glycyrrhiza (henceforth P. 
hesperium H®; Sigel, Windham, and Pryer, 2014). Geographic data and 
estimates of divergence times between P. amorphum, P. glycyrrhiza, and their 
respective sister taxa suggest that the IDLs of P. hesperium likely originated 
within the last 1 million years (Sigel, Windham, and Pryer, 2014). 

Moving toward the goal of understanding how ferns utilize the genetic 
diversity imparted by allopolyploidy and recurrent origins, we assessed 
patterns of gene expression between the IDLs of P. hesperium. Because 
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- Polypodium lacks a published genome, we adopted high throughput RNA- 
sequencing (RNA-Segq) to construct a well-annotated reference transcriptome 
for Polypodium and identify a suite of SNP markers between P. amorphum and 
P. glycyrrhiza. By mapping sequencing reads back to the reference tran- 
scriptome, we compare gene expression among P. hesperium and its diploid 
progenitors, and assess patterns of homeolog-specific expression in P. 
hesperium. Specifically, we ask the following questions: (1) Do the IDLs of 
P. hesperium exhibit similar patterns total gene expression? (2) Do the IDLs of 
P. hesperium exhibit similar patterns of homeolog-specific expression? (3) 
Does P. hesperium exhibit expression level dominance and homeolog 
expression bias comparable to other allopolyploid plants? 


MATERIAL AND METHODS 


Plant material.—For this study we used 12 Polypodium individuals—three 
individuals of P. amorphum, three individuals of P. glycyrrhiza, three 
individuals of P. hesperium H*, and three individuals of P. hesperium H® 
(Table 1). All individuals of P. hesperium were included in a previous study 
investigating its reciprocal origins in which their maternally inherited plastid 
haplotypes were confirmed by sequencing the trnG-trnR intergenic spacer 
(Sigel, Windham, and Pryer, 2014). 

All plants were collected from wild populations between August 2010 and 
August 2011, and transported live to the Duke University Greenhouses. Each 
plant was cleaned with water to remove soil from the roots and rhizomes and 
repotted in Farfard Mix 2 (Sun Gro Horticulture Canada Ltd., USA). Plants 
were maintained under common glasshouse conditions (photoperiod 18 h: 6 h, 
light: dark, with luminosity of 200-400 Umol sec ' cm’; 27-67% humidity; 
daytime temperature: 18.3-21.1°C; nighttime temperature: 17.8—20.6°C) for a 
minimum of 18 months prior to sampling for RNA extraction. Material from a 
single leaf was taken from each individual, flash frozen in liquid nitrogen 21 
days after it had fully unfurled but before sporangia had developed, and stored 
at -80°C until RNA extraction. Material for all individuals was collected 
between February 28 and March 5, 2013, and always between 10:00 and 11:00 
am Eastern Standard Time. 

RNA extraction, library construction, and Illumina sequencing.—Total RNA 
was extracted from 70-100 milligrams of leaf material per Polypodium 
individual using the Spectrum Plant Total RNA kit (Sigma-Aldrich, U.S.A.) 
without modification. The Aglient DNA 2100 Bioanalyzer (Aglient Technol- 
ogies, U.S.A.) were used to assess the quality and concentration of each RNA 
extraction. Twelve mRNA libraries, one for each Polypodium individual, were 
constructed using the TruSeq RNA Sample Prep Kit (Illumina, U.S.A). The 
standard protocol was modified to produce barcoded, strand-specific, paired- 
end libraries following Borodina, Adjaye, and Sultan (2011). Equimolar 
amounts of the 12 libraries were pooled into a single sample and submitted 
to the Duke University Genome Sequencing and Analysis Core (www.genome. 
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Taste 1. Polypodium specimens used in this study. DB numbers refer to unique individual 
identifiers as designated in the Duke Fern Lab Database (https://fernlab.biology.duke.edu). H® = 
plastid genome derived from Polypodium amorphum; H® = plastid genome derived from 
Polypodium glycyrrhiza. All vouchers are accessioned at DUKE herbarium. 


Taxon (ploidy) DB number Voucher information 
Polypodium amorphum Suksd. (2x) : 
P. amorphum 8521 Canada, British Columbia, Squamish-Lillooet Regional 
District, Rothfels 4084 
P. amorphum 1773 U.S.A., Oregon, Multnomah County, Sigel 2010-104 
P. amorphum Mis U.S.A., Washington, King County, Sigel 2010-125 
Polypodium glycyrrhiza D.C. Eaton (2x) 
P. glycyrrhiza 8493 Canada, British Columbia, Squamish-Lillooet Regional 
District, Rothfels 4059 
P. glycyrrhiza 7767 U.S.A., Washington, Snohomish County, Sigel 2010-81 
P. glycyrrhiza 7559 U.S.A, Oregon, Lincoln County, Rothfels 3875 
Polypodium hesperium Maxon (4x) 
P. hesperium H* 8179 U.S.A., Idaho, Shoshone County, Sigel 2011-46 
P. hesperium H* 8276 U.S.A., Montana, Lincoln County, Sigel 2011-31C 
P. hesperium H* 8280 U.S.A., Montana, Lake County, Sigel 2011-37 
P. hesperium H® 8175 U.S.A., Arizona, Coconino County, Sigel 2011-08A 
P. hesperium H® 8177 U.S.A., Arizona, Pinal County, Sigel 2011-09A 
P. hesperium H® 8288 U.S.A., Arizona, Graham County, Sigel 2011-10 


duke.edu). Sequencing of 100 base paired-end reads was performed on four 
lanes of the lumina HiSeg 2000 (IIlumina, U.S.A.). 

Processing of raw sequencing reads.—Reads from each library were sorted by 
barcode, and the number and quality of reads for each sample were evaluated 
with FastQC v. 3-5-2012 (Andrews, 2010). Adapter sequences and low-quality 
reads were removed using Trimmomatic v. 0.30 (Bolger, Lohse, and Usadel, 
2014). All read pairs for which both the forward and reverse read passed 
quality filtering were used in subsequent analyses. 

Generating a de novo Polypodium reference transcriptome.—The lack of a 
published genome for Polypodium or closely related genera required us to 
generate a de novo reference transcriptome as a prerequisite for assessing gene 
expression patterns. To minimize read mapping related technical bias, an 
initial reference transcriptome was generated using all filtered reads from the 
three individuals of P. amorphum and the three individuals of P. glycyrrhiza 
using Trinity v. r20140413 (Grabherr et al., 2011) on the Duke Shared Cluster 
Resource. A strand-specific protocol (RF) was used with a fixed k-mer size of 
25 and a maximum insert size of 800bp. All reads were then mapped back to 
the initial reference transcriptome using Bowtie v. 1.0.1 (Langmead et al., 
2009) and quantified with RSEM v. 1.2.14 (Li and Dewey, 2011) using the 
wrapper script align and_estimate_abundance.pl (Grabherr et al., 2011). 
Transcripts with less than three reads mapped back to them were considered 
putative artifacts of the transcriptome assembly algorithm and removed from 
the reference transcriptome (Grabherr et al., 2011; Haas et al., 2013). 
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To restrict differential gene expression analyses to protein coding genes of 
known Gene Ontology (GO) classifications (Ashburner et al., 2000), we first 
translated transcripts using ESTScan, a program that detects coding regions 
and corrects sequencing errors that cause frameshifts (Iseli, Jongeneel, and 
Bucher, 1999). We then annotated the reference transcriptome to orthogroup 
(i.e., narrowly defined gene lineages; Li, Stoeckert, and Roos, 2003) using the 
global PlantTribes gene family classification (Wall et a/., 2008) as described in 
Sigel et al. (2018). All transcripts without hits to the PlantTribes classification 
were excluded from the Polypodium reference transcriptome and not used in 
subsequent analyses. The completeness of the final Polypodium transcriptome 
assembly was assessed by quantifying the presence of universal single copy 
orthologs using BUSCO software and the Embryophyta odb09 dataset under 
the trans setting (Simao et al., 2015). 

Read mapping, quantification, and differential expression analysis.—For 
each of the 12 Polypodium individuals, filtered reads were mapped to the 
Polypodium reference transcriptome using Bowtie v. 1.0.1 (Langmead et al., 
2009) as implemented in the Trinity package v. r20140413 (Grabherr et al., 
2011), using parameters for paired-end, strand-specific reads (as described in 
Haas et al., 2013). Maximum likelihood read abundances were calculated for 
each gene using RSEM v. 1.2.13 (Li and Dewey, 2011), and the Trinity package 
script abundance_estimates_to_matrix.p] (Haas et al., 2013) was used to 
generate a combined matrix of read abundances of all genes for all samples. 
Prior to differential expression analyses, the combined read abundance matrix 
was filtered to remove genes that were not expressed consistently (i.e., 
expressed in all three biological replicates) by either P. amorphum and/or P. 
glycyrrhiza. Additive, or midparent (Grover et al., 2012), expression between 
P. amorphum and P. glycyrrhiza was calculated for each gene in silico by 
calculating the mean expression of the six diploid accessions. 

Differential expression analyses among individuals were performed using 
the edgeR release 2.14 in R v. 3.1.0 (Robinson, McCarthy, and Smyth, 2010) 
with the trimmed mean of M-values method (TMM) to normalize read counts 
within and across libraries (Robinson and Oshlack, 2010; Dillies et al., 2013). 
This results in gene expression values that are normalized to transcriptome 
size, effectively reducing gene expression estimates to per transcriptome 
concentrations rather than absolute expression. Differential expression 
analyses were performed in one of two ways: (1) for the three biological 
replicates of each IDL of P. hesperium and (2) for each of the six individuals of 
P. hesperium. For both analyses, differential expression of a gene was assessed 
by comparing the gene expression among P. hesperium, P. amorphum, P. 
glycyrrhiza, and the in silico midparent using Fisher’s exact test. The 
Benjamini and Hochberg (BH) method was used to adjust p-values to account 
for false discovery (FDR) of differentially expressed genes (Benjamini and 
Hochberg, 1995; Robinson and Oshlack, 2010). The number of genes exhibiting 
statistically significant differential expression among P. hesperium, P. 
amorphum, P. glycyrrhiza, and the in silico midparent were identified using 
thresholds of log2 fold change (FC) > 2 and FDR < 0.01. Significant differences 
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in the number of differentially expressed genes between the IDLs of P. 
hesperium and its diploid progenitors were determined with Fisher’s exact 
test, « = 0.01, as implemented in R (R Core Team, 2013). 

Expression level dominance.—Each gene identified as differentially ex- 
pressed between P. hesperium and at least one of its diploid progenitors was 
assigned to one of 12 possible categories in accordance with Rapp, Udall, and 
Wendel (2009), which includes multiple categories of additivity (midparent), 
expression level dominance (statistically identical to only one progenitor), and 
transgressive regulation (gene expression outside the range of either progen- 
itor). Significant differences in the number of genes belonging to each of the 12 
categories within and between the IDLs of P. hesperium were determined with 
Fisher’s exact test, « = 0.01, as implemented in R (R Core Team, 2013). 

Homeolog-specific expression.—One method for assessing subgenomic 
contributions to gene expression in allopolyploids uses fixed single nucleotide 
polymorphisms (SNPs) between progenitor species to identify homeologous 
gene copies in the allopolyploid. SNPs between progenitor species at the time 
of hybridization are vertically transmitted to the derived allopolyploid. Even 
after substantial evolutionary time, some SNPs will remain fixed between the 
diploid species and can be used to identify homeologous gene copies in the 
allopolyploid (Adams, 2007; Flagel and Wendel, 2009). Encouraged by 
numerous applications of this method to model and non-model allopolyploids 
(e.g., Buggs et al., 2010; Combes et al., 2013; Akama et al., 2014; Krasileva et 
al., 2013; Mithani et al., 2013; Nagy et al., 2013; Page et al., 2013; Cox et al., 
2014), we identified putative interspecific SNPs between the diploid species P. 
amorphum and P. glycyrrhiza to estimate homeolog-specific contributions to 
gene expression in the IDLs of P. hesperium. 

We used GATK Unified Genotyper in combination with previously 
generated bam alignment files to simultaneously identify SNPs between the 
Polypodium reference transcriptome and each of the 12 Polypodium 
individuals (McKenna et al., 2010; DePristo et al., 2011; Van der Auwera, 
2013). The -stand\_call\_conf and -stand\_emit\_conf parameters were set to 
30 and 10, respectively, to exclude low quality variant calls. The -ploidy 
parameter was set to 2 for each individual, regardless of whether it was diploid 
or allopolyploid, to reduce the complexity of variant calls and limit 
subsequent analyses to intergenomic homeologous SNPs in P. hesperium 
rather than intragenomic variation (i.e., between homologs). GATK Unified 
Genotyper yielded a single variant call format (VCF) file containing the 
estimated genotype at each SNP site for each Polypodium individual and the 
read depth of each allele. 

Custom scripts were used to identify putative fixed SNPs between the 
diploid progenitors, and calculate the homeolog expression ratios for each 
individual of P. hesperium. Briefly, SNPs were filtered to exclude sites with 
low mapping quality (Phred scaled probability score < 30) and genotype calls 
based on a low number of reads (< 4 reads). Putatively fixed SNPs between P. 
amorphum and P. glycyrrhiza were identified as having reciprocal homozy- 
gous genotypes in all replicates of each species (i.e., P. amorphum genotype = 
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0/0 and P. glycyrrhiza genotype = 1/1, or P. amorphum genotype = 1/1 and P. 
glycyrrhiza genotype = 0/0). The homeolog expression ratio for a particular 
gene for each individual of P. hesperium was estimated by dividing the 
number of P. amorphum-derived reads by the total number of reads for all 
SNPs assigned to that gene. A homeolog expression ratio of 1 indicates that 
expression for a given gene is solely comprised of homeologs derived from P. 
amorphum, whereas a homeolog expression ratio of 0 indicates that expression 
for a given gene is comprised solely of homeologs derived from P. glycyrrhiza. 
Intermediate ratios indicate that expression is comprised of homeologs derived 
from both P. amorphum and P. glycyrrhiza. 

For each gene expressed in all six individuals of P. hesperium, we estimated 
the total number of reads mapping to a P. amorphum-derived homeolog and a 
P. glycyrrhiza-derived homeolog by multiplying the total raw read counts as 
determined by RSEM v. 1.2.13 (Li and Dewey, 2011), by the homeolog 
expression ratio. EdgeR release 2.14 in R v. 3.1.0 (Robinson, McCarthy, and 
Smyth, 2010) was used to estimate FPKM values for homeolog-specific read 
counts (Robinson and Oshlack, 2010; Dillies et al., 2013). We then 
implemented two likelihood ratio tests (LRTs; Edger et al., 2017; Smith et 
al., 2019) in MATLAB v. 9.6.0 (MATLAB, 2019) to identify genes with 
statistically significant differences in homeolog-specific expression for P. 
hesperium H* and P. hesperium H®. This method determines if there is 
sufficient evidence to reject a null hypothesis of no homeolog expression bias 
in favor of an alternative hypothesis of homeolog expression bias for each gene, 
while accounting for relative read depth within and among P. hesperium 
individuals (Smith et al., 2019). Statistical significance was assessed at o=0.05, 
applying the BH correct for multiple testing error (Benjamini and Hochberg, 
1995). A third LRT test was conducted to identify genes with significant 
differences in the magnitude of homeolog expression bias between P. 
hesperium H®* and P. hesperium H® (Smith et al., 2019; see File S1 for 
MATLAB code and data tables). 


RESULTS 


Illumina sequencing reads.—Approximately 46-68 million 100bp paired- 
end reads were generated for each Polypodium individual. After removing 
adapter sequences and filtering out low quality reads, 87.9-89.7% of read pairs 
were retained, as were 4.9-6.2% of forward only reads and 1.9-2.1% of reverse 
only reads (Table S2). Raw Illumina reads, as well as nucleotide sequences and 
peptide translations of the final reference transcriptomes are deposited in 
NCBI’s Sequence Read Archive (www.ncbi.nlm.nih.gov/sra; BioProject: 
PRJNA549695). 

Polypodium reference transcriptome and annotation.—The initial Polypo- 
dium transcriptome comprised 420542 transcripts corresponding to 192418 
genes (e.g., Trinity components) with an N50 of 1560 bases and a mean contig 
length of 814.64 bases. Following filtering to remove potential assembly 
artifacts and any transcripts without significant similarity (e-value < 1e °) to 
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one or more orthogroup HMM profiles present in the PlantTribes classification, 
the final Polypodium transcriptome comprised 82893 transcripts correspond- 
ing to 25769 genes, with an N50 of 2300 bases and an average transcript length 
to 1756.44 bases (see Table S3 for assembly statistics and Files S2 and S3 for 
nucleotide sequences and peptide translations of the final reference tran- 
scriptome). A summary of orthogroup numbers and annotation terms 
associated with each of the 82893 transcripts in the final Polypodium reference 
transcriptome is provided in Table S4. BUSCO assessment determined that 
69.1% of universal single-copy Embryophyta orthologs are represented as 
complete sequences, with another 4.4% represented as partial sequences (Fig. 
S1). BUSCO representation in our final Polypodium reference transcriptome is 
similar to the previously published P. amorphum transcriptome (Sigel et al., 
2018) , 

Differential expression.—Of the 25769 genes in the Polypodium reference 
transcriptome, 19194 were consistently expressed (i.e., expressed in all three 
biological replicates) by either P. amorphum and/or P. glycyrrhiza and used for 
differential expression analyses (Table S5). Comparison of transcriptome- 
normalized gene expression in P. amorphum and P. glycyrrhiza using 
thresholds of FC > 2 and FDR < 0.01 revealed that 3716 genes (19.4%) were 
differentially expressed between the diploid species (Fig. 1), with significantly 
more genes highly expressed in P. amorphum (2712 genes, 14.1%) than were 
highly expressed in P. glycyrrhiza (1004 genes, 5.2%; p-value < 0.0001, 
Fisher’s exact test). 

Comparisons of transcriptome-normalized gene expression in P. hesperium 
H* and P. hesperium HF relative to their diploid progenitors recovered similar 
patterns of differential expression between the IDLs (Fig. 1a, b; also see Fig. S1 
for individual P. hesperium specimens). For the vast majority of genes 
surveyed, no significant differences in per transcriptome expression levels 
were detected between P. hesperium and either of its diploid progenitors. Both 
IDLs of P. hesperium had substantially less differentially expressed genes 
when compared with P. amorphum, than with P. glycyrrhiza or the in silico 
midparent. Put differentially, per transcriptome gene expression levels in both 
P. hesperium H* and P. hesperium H® more closely mirrored that of P. 
amorphum. Of the genes that were differentially expressed between P. 
amorphum and P. hesperium, significantly more were highly expressed in P. 
amorphum. In contrast, of those genes that were differentially expressed 
between P. glycyrrhiza and P. hesperium, significantly more were highly 
expressed in P. hesperium (Fig. 1). 

One striking difference between the IDLs of P. hesperium is the number of 
differently expressed genes relative to P. amorphum. Polypodium hesperium 
H® has significantly fewer genes that are differentially expressed with P. 
amorphum than does P. hesperium H* (H® :28 vs. H*: 638; Fig. 1). However, 
this pattern is not consistently recovered in analyses of individual P. 
hesperium accessions (Fig. S2), indicating that there is substantial variation 
in the number and identity of genes mirroring the expression levels of P. 
amorphum and P. glycyrrhiza within and between the IDLs of P. hesperium. 
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Fic. 1. Patterns of differential gene expression in the independently derived lineages of P. 
hesperium. A-B. Number of genes differentially expressed in each contrast between P. hesperium 
H* and H®, respectively, and each of its diploid progenitors at thresholds log2 fold change > 2 and 
FDR < 0.01. Bolded values indicate the total number and percentage of genes differentially 
expressed in each contrast. Values that are not bolded indicate the number of genes that are more 
highly expressed in a particular species for each contrast. For example, of the 19194 genes 
evaluated, 3716 (19.3%) are differentially expressed between P. amorphum and P. glycyrrhiza, 
with 2712 (14.1%) more highly expressed in P. amorphum and 1004 (5.2%) more highly expressed 
in P. glycyrrhiza. C. The number of genes belonging to 12 categories of differential expression 
among P. hesperium, P. amorphum, and P. glycyrrhiza. Roman numerals correspond to the 
category designations given in Rapp et al., (2009), and graph insets illustrate the expression level of 
P. hesperium (H) relative to P. amorphum (A) and P. glycyrrhiza (G). “DE A and G only” refers to 
genes with statistically significant differences in gene expression between P. amorphum and P. 
glycyrrhiza but not P. hesperium. “No Change” refers to genes without a statistically significant 
change in gene expression between all three species. 


P. glycyrrhiza- 


Expression level dominance.—Following the criteria of Rapp, Udall, and 
Wendel (2009), differentially expressed genes were binned into 12 categories 
reflecting the per transcriptome expression level in P. hesperium relative to its 
diploid progenitors (Fig. 1; see Fig. S3 for individual P. hesperium specimens). 


234 AMERICAN FERN JOURNAL: VOLUME 109, NUMBER 3 (2019) 


There were no significant changes in expression among P. hesperium (for both 
H® and H®), P. glycyrrhiza, and P. amorphum for the vast majority of the 19194 
genes surveyed. For both IDLs of P. hesperium, of the genes with differential 
expression relative to the diploid progenitors, the vast majority exhibited 
expression level dominance mirroring P. amorphum. Notably, this was true for 
most genes in which expression is upregulated in P. amorphum relative to P. 
glycyrrhiza (category IV). Both IDLS had very few genes exhibiting additive 
expression (categories I and XII) and transgressive regulation (categories III, V, 
VI, VII, VUI, X). Similarly, a broad pattern of expression level dominance 
favoring P. amorphum over P. glycyrrhiza, is present for each of the six 
individual accessions of P. hesperium regardless of IDL, with substantial 
variation in the number and identity of genes in each expression category (Fig. 
53). 

Homeolog-specific expression.—A total of 935,207 SNPs were detected 
among the 12 Polypodium samples (File S4). We were able to identify 5564 
putatively fixed SNPs between the diploid species. By combining genotype and 
read depth information from all SNPs assigned to a single gene, we identified 
1073 genes for which both P. amorphum-derived homeologs and P. glycyrrhiza- 
derived homeologs were expressed in all six individuals of P. hesperium. 

For a majority of these 1073 genes both P. hesperium H* and P. hesperium H® 
exhibited strong, statistically significant homeolog expression bias favoring P. 
amorphum-derived homeologs; 923 (86.0%) and 744 (69.3%) genes in P. 
hesperium H* and P. hesperium H®, respectively. For P. hesperium H* and P. 
hesperium H®, P. amorphum-derived homeologs were expressed, on average, 
approximately 8 and 12 times greater (H*: FC=3.06; H®: FC= 3.62) than P. 
glycyrrhiza-derived homeologs (Fig. 2a,b). 

In contrast, both IDLs of P. hesperium exhibited few genes with strong 
expression bias in favor of P. glycyrrhiza-derived homeologs (e.g., H*: 100 
genes, H®: 39 genes). For these genes, there is a notable difference between P. 
hesperium H* and P. hesperium H® in the magnitude of their homeolog 
expression biases. On average, P. hesperium H* exhibited a 3-fold bias (FC = 
1.54) in favor of P. glycyrrhiza-derived homeologs, whereas P. hesperium H® 
exhibited a 10-fold bias (FC = 3.31) in favor of P. glycyrrhiza-derived 
homeologs (Fig. 2a,b). 

When comparing homeolog expression biases between the IDLs of P. 
hesperium at the level of specific, individual genes, we found that the majority 
(81%) of 1073 genes surveyed did not exhibit a statistically significant 
difference in the magnitude or direction of the expression bias (Fig. 2c). Of the 
209 genes that differed in homeolog expression bias between the IDLs, 128 
exhibited preferential expression of P. glycyrrhiza-derived homeologs in P. 
hesperium H* relative to P. hesperium H*. 


DISCUSSION 


Allopolyploid species are prevalent in both flowering plants and ferns 
(Wood et al., 2009), but studies investigating gene expression in allopolyploids 
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Fic. 2. Results of likelihood ratio tests for homeolog expression bias (HEB) in P. hesperium for 
1073 genes (BH-adjusted p-value < 0.05). HEB is reported as the log2 fold change in expression 
between homeologs, with negative values indicating a bias toward P. amorphum-derived 
homeologs and positive values indicating a bias toward P. glycyrrhiza-derived homeologs. A-B. 
Histograms of the distribution of HEB in P. hesperium H* and P. hesperium H®, respectively. Genes 
significantly biased toward the expression of P. amorphum-derived homeologs are shown in red, 
genes significantly biased toward the expression of P. glycyrrhiza-derived homeologs are shown in 
blue, and non-biased genes are shown in gray. C. Distribution of change in HEB between P. 
hesperium H* and P. hesperium H*. Genes with significantly greater HEB favoring the P. 
amorphum-derived homeolog in P. hesperium H® relative to P. hesperium H® are shown in red, 
whereas genes with significantly greater HEB favoring the P. glycyrrhiza-derived homeolog in P. 
hesperium H* relative to P. hesperium H* are shown in blue. Genes without significant change in 
HEB are shown in gray. D-C. Transcriptome-normalized expression levels of P. amorphum-derived 
homeologs and P. glycyrrhiza-derived homeologs in P. hesperium H* and P. hesperium H®, 
respectively. Color coding is the same as in panels A and B. E. HEB in P. hesperium H* and P. 
hesperium H®. Color coding is the same as in panel C. 
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have been largely limited to angiosperms. Here we broaden the evolutionary 
perspective by using RNA-Seq to assess patterns of gene expression in 
Polypodium hesperium, an allotetraploid fern taxon comprising reciprocally- 
formed IDLs. Recurrent origins increase the genetic diversity of an allopoly- 
ploid species by integrating variation from different progenitor populations 
(Werth and Lellinger, 1992; Soltis and Soltis, 1999), but little is known about 
how recurrent origins contribute to variation in gene expression. We find that 
the IDLs of P. hesperium are remarkably consistent in their transcriptome-wide 
patterns of gene expression and homeolog expression biases. However, we also 
find substantial variation in the expression of particular genes and among 
individuals of P. hesperium, suggesting that recurrent origins may contribute 
to phenotypic variation of an allopolyploid taxon. In addition, we report 
unprecedented levels of unbalanced expression level dominance and 
unbalanced homeolog expression bias, supporting the notion that these 
phenomena are pervasive consequences of allopolyploidy in plants. 

Additive gene expression in P. hesperium.—It is widely reported that the 
expression levels of many genes in angiosperm allopolyploids deviate from 
additive, or midparent, values (e.g., Wang et al., 2006; Chagué, et al., 2010; 
Yoo, Szadkowski, and Wendel, 2013; Zhao et al., 2013, Jiang et al., 2015). 
However, for the vast majority of genes surveyed for both P. hesperium H* and 
P. hesperium H® we failed to recover support for per transcriptome gene 
expression levels that significantly deviate from in silico midparent values (H’: 
1.9%; H®: 2.4%; Fig. 1a,b). While both ILDs of P. hesperium did have many 
non-additively expressed genes—2.4-4.6% of genes significantly differed from 
the in silico midparent value—our results are lower than reported for other 
allopolyploid taxa. For example, the percentage of non-additive genes in 
synthetic Arabidopsis (Wang et al., 2006), wheat (Chagué, et al., 2010), and 
Brassica (Zhao et al., 2013) allopolyploids ranged from 4.5-7.8%, with even 
higher percentages in naturally-formed cotton allotetraploids (16.3-18.3%, 
Yoo, Szadkowski, and Wendel, 2013) and Brassica napus (16%; Zhang et al., 
2016). 

The low percentage of non-additively expressed genes in P. hesperium 
relative to other allopolyploids largely reflects that the vast majority of genes 
surveyed in P. hesperium H* and P. hesperium H® have per transcriptome 
expression levels that do not significantly differ from both P. amorphum and P. 
glycyrrhiza (79.0% and 80.2%, respectively; Fig. 1c). This lack of differential 
expression among all three taxa, or “no change”, is a special case of additivity 
or midparent expression (Rapp, Udall, and Wendel, 2009), and may result from 
low levels of sequence divergence (2-3% divergence; Table S1) between P. 
amorphum and P. glycyrrhiza. However, the percentage of differentially 
expressed genes between P. amorphum and P. glycyrrhiza is less than that 
recovered from comparable studies of natural allopolyploid angiosperm 
complexes with similar levels of divergence. For example, the diploid 
progenitors of Gossypium and Coffea allopolyploids exhibit differential 
expression in 42-53% and 36-55% of genes, respectively (Rapp, Udall, and 
Wendel, 2009; Bardil et al., 2011; Yoo, Szadkowski, and Wendel, 2013). This 
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suggests that in Polypodium relative gene expression (i.e., per transcriptome 
expression) is largely conserved both between divergent diploid taxa, as well 
as following allopolyploidization, and seemingly at levels much higher than 
reported for angiosperm allopolyploid complexes. 

Unbalanced expression level dominance in P. hesperium.—Expression level 
dominance, the condition in which the expression of a gene in an 
allopolyploid mirrors that of one of its progenitor species but not the other, 
occurs in many taxa including Brassica (Zhang et al. 2016), Spartina (Chelaifa, 
Monnier, and Ainouche, 2010), cotton (Rapp, Udall, and Wendel, 2009; Flagel 
and Wendel, 2010), and wheat (Chagué et al., 2010; Li et al., 2014). Here we 
report expression level dominance for 7.7% and 8.8% of the genes surveyed in 
P. hesperium H* and P. hesperium H®, respectively (Fig. 1c, categories I, IV, 
IX, and XJ). Despite substantial differences in the identity of specific genes 
exhibiting expression level dominance between the IDLs (Fig. 1c), both P. 
hesperium H* and P. hesperium H® exhibit per transcriptome expression levels 
that almost exclusively mirror those of P. amorphum. This is particularly true 
for genes that are highly expressed by P. amorphum relative to P. glycyrrhiza. 
Strikingly, we recovered extremely low numbers of genes exhibiting 
transgressive expression in P. hesperium H* and P. hesperium H?® (Fig. 1c, 
categories III, V—VIII, X). Hence, genes that are differentially expressed 
between P. hesperium and one progenitor almost exclusively exhibit 
unbalanced expression-level dominance biased toward P. amorphum (85% 
for H* and 98% for H®). Put differently, both P. hesperium H* and P. hesperium 
H® have per transcriptome gene expression patterns more similar to those of P. 
amorphum, suggesting that P. amorphum is the dominant progenitor dictating 
gene expression in P. hesperium (Rapp, Udall, and Wendel, 2009; Grover et al., 
2012; Buggs et al., 2014). This result is in line previous studies that report that 
the dominant progenitor is commonly the more highly expressed (Steige and 
Slotte, 2016; Bottani et al. 2018)—approximately 2.7X more genes are up- 
regulated in P. amorphum than in P. glycyrrhiza (Fig. 1a, b). 

The extent of expression level dominance reported here is unusual. We can 
find only one example in which the proportion of genes exhibiting expression 
level dominance exceeds 25% (Wu et al. (2018) reports 74% in resynthesized 
Brassica napus) and no other examples with so few occurrences of 
transgressive regulation (e.g., Rapp et al., 2009; Chagué et al., 2010; Chelaifa 
et al., 2010; Flagel & Wendel 2010; Bardil et al., 2011; Yoo, Szadkowski, and 
Wendel, 2013; Cox et al., 2014; Zhang et al., 2016; Wu et al., 2018). Surprised 
by these findings and impeded by a general lack of knowledge about genome 
evolution and regulation in ferns, it is unclear why P. hesperium has such high 
expression level dominance relative to most angiosperm allopolyploids. 
Polypodium hesperium, and perhaps ferns in general, may exhibit extreme 
manifestations of the factors thought to promote expression level dominance 
(see discussion below). 

While the degree of expression level dominance reported here is intriguing, 
it is necessary to include an important caveat when interpreting these results. 
We report expression levels for each individual normalized to its tran- 
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scriptome size, with expression of individual genes represented as a fraction of 
total expression (Robinson and Oshlack, 2010). While transcriptome-normal- 
ized expression is standard for studies using RNA-Seq to explore gene 
expression in allopolyploids relative to their diploid progenitors (e.g., Shi et 
al., 2012; Combes et al., 2013; Page ‘et al., 2013; Yoo, Szadkowski, and Wendel, 
2013; Cox et al., 2014; Xu et al., 2014; Wu et al., 2018), it ignores the potential 
for variation in transcription per cell and implicitly assumes that tran- 
scriptome size is constant among taxa. Transcriptome-normalized expression 
reflects transcript concentration and should not automatically be equated with 
absolute expression (Coate and Doyle, 2010, 2015; Loven et al., 2012; Buggs et 
al., 2014; Visger et al., 2019). If the total number of transcripts per cell differs 
between individuals, as may be the case with a polyploid taxon and its diploid 
progenitors, a gene that does not exhibit differential expression between taxa at 
the transcriptome-level may exhibit differential expression at the cell-level, or 
vice versa. Direct quantification of transcription per cell (e.g., by extracting 
RNA from known a number of cells in combination with spiked-in exogenous 
RNA; Visger et al., 2019) is necessary to convert expression per transcriptome 
to absolute expression per cell. 

Only two studies of diploid-polyploid changes in gene expression changes 
have accounted for variation in total transcription per cell: Coate and Doyle 
(2010) report that the average transcription per cell of the allotetraploid 
Glycine dolichocarpa is 1.4X that of its diploid progenitors, and Visger et al. 
(2019) report that the autotetraploid Tolmiea menziesii has average per cell 
transcription levels 2.1 that of the diploid T. diplomenziesii. Hence, doubling 
of ploidy does not necessarily lead to a doubling of transcription per cell. 
Transcription per cell may vary widely among taxa, and to the best of our 
knowledge, no study has ever quantified absolute expression per cell in 
polyploid and diploid ferns. Interestingly, Visger et al. (2019) reports that 61% 
of differential expressed genes between T. menziesii and T. diplomenziesii 
detected with per cell-normalized expression data were also considered 
differentially expressed using transcriptome-normalized data. These genes 
have high magnitudes of per cell differential expression between the diploid 
and polyploid taxa, suggesting that transcriptome-normalization methods may 
be adequate for detecting genes with high-levels of absolute differential 
expression (enough to change the per transcriptome concentrations) but fail to 
detect genes with low-levels of absolute differential expression. 

While we are confident the results of this study are comparable to most 
studies utilizing transcriptome normalized RNA-Seq data, the results of all 
these studies must be approached judiciously. In the case of P. hesperium and 
many other RNA-Seq studies of allopolyploid plants using transcriptome- 
normalized data, the number of genes exhibiting mid-parent expression levels 
and expression level dominance may be inflated. By utilizing both tran- 
scriptome-normalization and per cell-normalization methods in concert, 
future studies comparing gene expression in diploid and polyploid ferns 
could provide broader insights into gene dosage responses, the conservation of 
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expression levels with changes in biomass, and selection on absolute 
expression levels (Visger et al., 2019; Doyle and Coate, 2019). 

Homeolog expression bias in P. hesperium.—To determine how gene copies 
derived from different progenitors contribute to total gene expression in P. 
hesperium, we examined homeolog-specific expression for 1073 genes in both 
IDLs of P. hesperium. By identifying SNPs that are putatively fixed between P. 
amorphum and P. glycyrrhiza, we were able to infer the parental origin of 
homeologs in P. hesperium and calculate their relative expression. Just as we 
discovered unbalanced expression level dominance with P. amorphum as the 
dominant parent, we recovered unbalanced homeolog expression bias with a 
disproportionate number of genes in P. hesperium preferentially expressing P. 
amorphum-derived homeologs (Fig. 2). It is important to note that there are 
differences in the degree and direction of homeolog expression bias between 
the IDLs. For example, P. hesperium H* has more genes with homeolog 
expression bias than P. hesperium H® (1023 vs. 783), but P. hesperium H® has a 
larger average magnitude of bias favoring P. amorphum-derived homeologs 
(12-fold vs. 8-fold in H“, Fig. 2a, b). Furthermore, at the level of specific genes, 
P. hesperium H® tends to express more P. glycyrrhiza-derived homeologs (Fig. 
2c). While it appears that all individuals of P. hesperium, regardless origin, 
generally exhibit a strong bias for the expression of P. amorphum-derived 
homeologs across the transcriptome, statistically significant differences in 
homeolog expression bias between the IDLs of P. hesperium for individual 
genes (Fig. 2f), suggest that homeolog expression in these genes may be 
dictated by lineage-specific factors. 

Preferential expression of homeologs derived from a particular progenitor is 
commonly observed in other allopolyploids (e.g., Arabidopsis: Wang et al., 
2006; Coffea: Combes et al., 2013; Glycine: Coate, Bar, and Doyle, 2014; 
Gossypium: Chaudary et al., 2009; Flagel and Wendel, 2009, 2010; Rapp, Udall, 
and Wendel, 2009; Mimulus: Edger et al., 2017; Tragopogon: Tate et al., 2006; 
Buggs et al., 2010; Koh, Soltis, and Soltis, 2010; Zea: Schnable, Springer, and 
Freeling, 2011), and appears to be a prevalent consequence of allopolyploidiza- 
tion. Once again though, the percentage of genes exhibiting significant homeolog 
expression bias in P. hesperium is greater than in most surveyed allopolyploids. 
To the best of our knowledge, Tragopogon miscellus and T. mirus are the only 
other allopolyploids with IDLs for which homeolog-specific expression has been 
investigated. As in P. hesperium, all IDLs of both T. miscellus and T. mirus 
preferentially express homeologs derived from the same progenitor (Tate et al., 
2006, 2009; Koh, Soltis, and Soltis, 2010; Buggs et al., 2009, 2010). These 
findings suggest that recurrent origins of natural allopolyploids consistently 
yield similar transcriptome-wide patterns of homeolog expression. 

Possible mechanisms for expression level dominance and homeolog 
expression bias in P. hesperium.—Both expression level dominance and 
homeolog expression bias are broadly observed phenomena, prompting much 
recent interest in underlying genomic and regulatory mechanisms (e.g., 
Hollister and Gaut, 2009; Freeling et al., 2012; Steige and Slotte, 2016; Bird 
et al., 2018; Bottani et al., 2018; Wendel et al., 2018). While little is known 
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about gene regulation and genome evolution in ferns (Sessa et al., 2014; Sessa 
and Der, 2016), mechanisms in angiosperm allopolyploids may help explain 
patterns seen in P. hesperium. The prevailing explanation for expression 
biases in allopolyploids is a difference in the number and distribution of 
transposable elements (TEs) in their progenitor-derived subgenomes. TEs 
inserted in or near genes become the targets of RNA-directed DNA methylation 
and heterochromatic modification, causing the repression of gene expression 
(Parisod et al., 2010; Vicient and Casacuberta, 2017). Furthermore, it has been 
demonstrated in Arabidopsis allopolyploids that TE silencing leads to cis- 
regulatory variation (Shi et al., 2012). Under this scenerio, the subgenome with 
the lower density of TEs would be more likely to have higher gene expression 
and be the dominant progenitor genome dictating total and homeolog-specific 
gene expression levels in an allopolyploid (Bottani et al., 2018). Ultimately, 
the homeologs from the repressed progenitor subgenome may become lost, 
resulting in biased fractionation of the allopolyploid genome (Steige and 
Slotte, 2016). Empirical evidence for the role of TE mediated gene expression 
biases has been found in allopolyploids and F1 hybrids of Brassica (Wood- 
house et al., 2014) and Zea (Pophaly and Tellier, 2015). However, little 
evidence was found for the role of TE gene silencing in Capsella (Steige et al., 
2016) and Gossypium (Renny-Byfield et al., 2015), suggesting that other 
processes may contribute to expression biases. : 

Looking ahead, P. hesperium offers an ideal opportunity to test for a 
correlation between the divergence of TE abundance and distribution between 
subgenomes and expression biases in allopolyploid ferns. Specifically, we 
hypothesize that its P. glycyrrhiza-derived subgenome will harbor a greater TE 
load than its P. amorphum-derived genome, particularly near genes exhibiting 
preferential expression of P. amorphum-derived homeologs. By surveying the 
genomes of P. amorphum, P. glycyrrhiza, and both IDLs of P. hesperium, we 
may be able to determine if the abundance and distribution of TEs in the P. 
hesperium subgenomes reflects preexisting differences in its progenitor 
species or is due to the expansion of TEs following allopolyploidization, as 
has been observed for some angiosperms (e.g., Vicient and Casacuberta, 2017; 
Mhiri et al., 2018; Palacios et al., 2019). Furthermore, we will be able to assess 
the fidelity of the underlying mechanisms resulting in the similar tran- 
scriptome-scale patterns of gene expression between P. hesperium H®* and P. 
hesperium H® reported here. 
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Asstract.—Nuclear genome size is highly variable in vascular plants. The composition of long 
terminal repeat retrotransposons (LTR-RTs) is a chief mechanism of long term change in the 
amount of nuclear DNA. Compared to flowering plants, little is known about LTR-RT dynamics in 
ferns and lycophytes. Drawing upon the availability of recently sequenced fern and lycophyte 
genomes we investigated these dynamics and placed them in the context of vascular plants. We 
found that similar to seed plants, median LTR-RT insertion times were strongly correlated with 
haploid nuclear genome size. Fern and lycophyte species with small genomes such as those of the 
heterosporous Selaginella and members of the Salviniaceae had recent median LTR-RT insertion 
times, whereas species with large genomes such as homosporous ferns had old median LTR-RT 
insertion times intermediate between angiosperms and gymnosperms. This pattern holds despite 
methylation and life history differences in ferns and lycophytes compared to seed plants, and our 
results are consistent with other patterns of structural variation in fern and lycophyte genomes. 


Key Worps.—Long Terminal Repeat Retrotransposons, Monilophytes, Pteridophytes, Selaginella 


Vascular plants exhibit an astonishing diversity of genome sizes relative to 
other eukaryotes. Plant genomes vary more than 2,400 fold in size, from the 
minute genomes of Genlisea tuberosa [1C=0.06 Gbp (Fleischmann et al., 2014; 
Rivadavia, Gonella, and Fleischmann, 2013)] to the extremely large genomes of 
Paris japonica |1C=152.2 Gbp (Pellicer, Fay, and Leitch, 2010)]. The rate of 
plant genome size evolution appears to be lineage specific. For example, the 
genomes of Genlisea spp. range from 0.06—1.7 Gbp, with recent reductions in 
the smallest genomes following recent whole genome duplication (Fleisch- 
mann et al., 2014). This stands in contrast to the small genomes of the 
Selaginellaceae that have a narrower range of genome size from 1C 0.082-0.184 
Gbp (Baniaga, Arrigo, and Barker, 2016) despite deep divergences among 
subgenera dating to the Permian-Triassic boundary (Arrigo et al., 2013; 
Kenrick and Crane, 1997; Korall and Kenrick, 2002; Westrand et al., 2016; 
Zhou et al., 2016). 
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The primary mechanisms driving variation in genome size in vascular plants 
are whole genome duplication (WGD) and the proliferation of transposable 
elements. Although WGDs are common and phylogenetically widespread 
throughout the history of vascular plants (Barker et al., 2016; Cui et al., 2006; 
Jiao et al., 2011; Kagale et a/., 2014; Landis et al., 2018; Li et al., 2015; Li et al., 
2018), genome size is often reduced following polyploidization by fraction- 
ation and diploidization (Freeling et al., 2012; Murat et al., 2014). This 
reduction process may also be partially due to species specific maximum 
genome size thresholds (Zenil-Ferguson, Ponciano, and Burleigh, 2016). 
Consequently, most of the variation in genome size is caused by the expansion 
and contraction of Class I transposable elements, specifically Long Terminal 
Repeat retrotransposons (LTR-RTs) which increase in copy number via a copy- 
and-paste mechanism (Feschotte, Jiang, and Wessler, 2002). LTR-RTs can 
comprise a majority of a vascular plant genome (Michael, 2014), and total LTR- 
RT content is positively correlated with genome size in angiosperms (Flavell, 
1980; Michael, 2014; Tiley and Burleigh, 2015; Wendel et al., 2016). For 
example, in the extremely small genome of Utricularia gibba, LTR-RTs 
comprise only 2% of the genome (Ibarra-Laclette et a/., 2013), but in Zea 
mays they comprise more than 75% of the genome (Schnable et al., 2009). 
These dynamics lead LTR-RTs to be the dominant driver of genome size 
evolution in vascular plants. 

Comparative analyses of LTR-RT insertion times (measured backward from 
the present) can be used to reveal the dynamics of LTR-RTs and their impacts 
on genome evolution. It is possible to estimate LTR-RT insertion times because 
LTR-RTs have identical 5’ and 3’ ends at initial insertion. After insertion, they 
evolve at a neutral rate and, given an estimate of the synonymous substitution 
rate for a lineage or taxon, the divergence can be used to date LTR-RT insertion 
time (SanMiguel et al., 1998). Most angiosperms have young and active LTR- 
RTs inserted 0—4 mya with few full length LTR-RTs recovered beyond 4 mya in 
monocots and eudicots (El Baidouri and Panaud, 2013). Exceptions to this 
pattern are found in the genomes of Fritillaria spp. These plants have some of 
the largest known angiosperm genomes, and are characterized by the slow 
accumulation of diverse repeat elements without their subsequent removal 
(Kelly et al., 2015). Similar LTR-RT age distributions are found in 
gymnosperms, which also have large genome sizes. Gymnosperm genomes 
are characterized by numerous diverse repeats with a uniform distribution of 
LTR-RT insertion times dating back to more than 60 mya (Kovach et al., 2010; 
Nystedt et al., 2013; Voronova et al., 2017; Zuccolo et al., 2015). The 
persistence of LTR-RTs over millions of years is a common feature of the 
largest genomes of seed plants. 

In contrast to seed plants, we know relatively little about the dynamics of 
LTR-RTs and genome size evolution in the ferns and lycophytes. Fern nuclear 
genome size ranges from 1C=250 Mbp in the heterosporous water fern Salvinia 
cucullata (Li et al., 2018) to 1C=147.3 Gbp in Tmesipteris obliqua (Hidalgo et 
al., 2017). On average, most ferns have large genome sizes with a mean of 
1C=13.98 Gbp (Clark et al., 2016), intermediate between those of gymnosperms 
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and angiosperms. Genome sizes in lycophytes range from the extremely small 
genomes of 1C=81.5 Mbp in the spike-moss Selaginella selaginoides (Baniaga, 
Arrigo, and Barker, 2016) to 1C=11.7 Gbp in the quillwort Isoetes lacustris 
(Hanson and Leitch, 2002), with an average of 1.66 Gbp for lycophytes (Leitch 
and Leitch, 2013). Unlike seed plants, nuclear genome size in ferns and 
lycophytes correlates strongly with chromosome number (Bainard et al., 2011; 
Barker, 2013; Clark et al., 2016; Leitch and Leitch, 2013; Nakazato et al., 2008; 
Sessa and Der, 2016). We may thus expect that LTR-RTs play a lesser role in 
the evolution of fern and lycophyte genome size compared to seed plants. 

Investigations of LTR-RT dynamics in ferns and lycophytes are in their 
infancy, but have already revealed some important details. Generally, the 
proportion of the genome comprised of LTR-RTs is within the range observed 
in seed plants (Banks et al., 2011; Li et al., 2018; Van Buren et al., 2018; Wolf et 
al., 2015; Xu et al., 2018). LTR-RT insertion times have only been estimated for 
two heterosporous ferns and species of Selaginella. In the heterosporous ferns, 
which have much smaller genome sizes than their homosporous relatives, a 
high amount of recent activity is found with relatively low amounts of genetic 
divergence between LTR 5’ and 3’ ends (Li et al., 2018). In Selaginella the 
majority of LTR-RTs are also relatively young (Banks et a/., 2011; Van Buren et 
al., 2018; Xu et al., 2018), and are strongly linked to the high haplotype 
variation observed in S. lepidophylla (Van Buren et al., 2018). The role of LTR- 
RTs in influencing haplotype variation in Selaginella may explain the high 
heterozygosity and assembly difficulties of other Selaginella genomes despite 
their minute nature. 

Given that chromosome number is strongly correlated with nuclear genome 
size in ferns and lycophytes (Bainard et al., 2011; Barker, 2013; Clark et al., 
2016; Leitch and Leitch, 2013; Nakazato et al., 2008), we sought to test if 
genome size is also well correlated with LTR-RT insertion activity. With the 
advent of newly available genome-wide data for several fern and lycophyte 
taxa (Banks et al., 2011; Li et al., 2018; Van Buren et al., 2018; Wolf et al., 2015; 
Xu et al., 2018), we were able to explore the LTR-RT dynamics of ferns and 
lycophytes. If genome size is correlated with median LTR-RT insertion time, 
then we predict that 1) homosporous ferns should have intermediate LTR-RT 
insertion times between angiosperms and gymnosperms, and 2) the hetero- 
sporous Selaginella with their extremely small genomes and the relatively 
small genomes of the heterosporous water ferns should have recent LTR-RT 
insertion times comparable to angiosperms. 


MATERIALS AND METHODS 


LTR-RT discovery.—Genome assemblies used in analyses were downloaded 
from published and publicly available datasets (Goodstein et al., 2012; Li et al., 
2018; Van Buren et al., 2018; Wan et al., 2018; Wolf et al., 2015; Xu et al., 
2018). Six homosporous fern taxa (Ceratopteris richardii, Cystopteris protrusa, 
Dipteris conjugata, Plagiogyria formosana, Polypodium glycyrrhiza, and 
Pteridium aquilinum), two heterosporous fern taxa (Azolla filiculoides and 
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TABLE 1. 


genome comprised of LTR-RTs was sourced from the literature. 


201 


Summary of LTR-RT Composition in Fern and Lycophyte Genomes. Percent of the 


Haploid 
nuclear 
genome 
Percent size 
Taxon Authority Family LTR-RT (Mbp) Reference 
Azolla Lam. Salviniaceae 37.8 750 ~=+Li et al., 2018 
filiculoides 
Ceratopteris Brongn. Pteridaceae 37.6 11,205 Wolf et al., 2015 
richardii 
Cystopteris (Weath.) Blasdell Cystopteridaceae 31.4 4,230 Wolf et al., 2015 
protrusa 
Dipteris Reinw. Dipteridaceae 22.6 2,450 Wolf et al., 2015 
conjugata 
Plagiogyria Nakai Plagiogyriaceae 24.9 14,810 Wolf et al., 2015 
formosana 
Polypodium D. C. Eaton Polypodiaceae 28.5 10,020 Wolf et al., 2015 
glycyrrhiza } 
Pteridium (L.) Kuhn Dennstaedtiaceae 27.8 15,650 Wolf et al., 2015 
aquilinum 
Salvinia Roxb. Salviniaceae 19.2 260 ~—Li et al., 2018 
cucullata 
Selaginella (Hook. & Grev.) Selaginellaceae 13.07 109 Van Buren et al., 
lepidophylla Spring 2018 
Selaginella Hieron. Selaginellaceae 35.6 86 Banks et al., 2011 
moellendorffii 
Selaginella (P. Beauv) Spring Selaginellaceae 25.47 150 Xuetal., 2018 
tamariscina 


Salvinia cucullata), and three lycophyte taxa (Selaginella lepidophylla, S. 
moellendorffii, and S. tamariscina) were analyzed. Only Selaginella taxa were 
available to represent lycophytes. In general, the proportion of the genome 
comprised of LTR-RTs in ferns and lycophytes is similar to that observed in 
other seed plants (Table 1). In addition to the seven fern taxa and three species 
of Selaginella, five representative angiosperms (Amborella trichopoda, 
Sorghum bicolor, Arabidopsis thaliana, Erythranthe guttata, and Solanum 
lycopersicum) and three representative gymnosperms (Gnetum montanum, 
Picea abies, and Pinus sylvestris) were included in our analysis. 

Assembled genomes were scanned using LTR_Finder (Xu and Wang, 2007) 
with the following parameters; “Maximum distance between LTRs=15000”, 
“Minimum distance between LTRs=1000”, “Maximum LTR length=3500”, 
“Minimum LTR length=100”, “Length of exact match pairs=15”, “Extension 
cutoff=0.5”, “Reliable extension=0.5”. Nested LTRs were excluded from all 
analyses. Predicted full length LTR-RTs were extracted from scaffolds using 
custom perl scripts (https://bitbucket.org/barkerlab/baniaga_fern_lycophyte_ 
ltrs) and blasted to RepBase (Bao, Kojima, and Kohany, 2015) using the tblastx 
algorithm. The highest scoring hit to the repbase database from each predicted 
LTR-RT sequence was used in annotations. Predicted full length LTRs with no 
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TasL_e 2. Number of LTR-RTs recovered for each species. LTR-RTs with evidence of intraelement 
gene conversion (IGC) reported and excluded from further analyses. 


LTR-RTs Full length 
Taxon Authority with IGC LTR-RTs 
Selaginella lepidophylla (Hook. & Grev.) Spring 190 797 
Selaginella moellendorffii Hieron. 687 bed Wg 8} 
Selaginella tamariscina (P. Beauv) Spring 789 2675 
Azolla filiculoides Lam. 3701 17186 
Ceratopteris richardii Brongn. 8 125 
Cystopteris protrusa (Weath.) Blasdell 4 33 
Dipteris conjugata Reinw. 7 76 
Plagiogvria formosana Nakai 10 74 
Pteridium aquilinum (L.) Kuhn 47 391 
Salvinia cucullata Roxb. 303 1711 
Gnetum montanum Markgr. 3383 10977 
Picea abies (L.) H. Karst 4549 36868 
Pinus sylvestris L. 3 89 
Amborella trichopoda Baill. 160 1166 
Arabidopsis thaliana (L.) Heynh. 56 286 
Erythranthe guttata (Fisch. Ex DC) G. L. Nesom 533 2535 
Solanum lycopersicum L. 723 3916 
Sorghum bicolor (L.) Moench 2549 12980 


significant hit to RepBase were blasted to the GenBank nr database using the 
tblastx algorithm, and significant hits (e-value < 1e °) to protein coding genes 
were removed. A custom perl script was used to extract 5’ and 3’ LTR pairs 
from sequences based on their predicted start and end location from the 
LTR_Finder output (https://bitbucket.org/barkerlab/baniaga_fern_lycophyte_ 
ltrs). Following this filtering, only eight putative LTR-RTs were recovered for 
Polypodium glycyrrhiza and this species was at this point excluded from our 
analysis. : 

Unlike most investigations of LTR-RT dynamics in vascular plants, we chose 
to exclude LTRs with a history of intraelement gene conversion (Table 2). This 
is because the estimation of LTR-RT insertion time assumes that the 5’ and 3’ 
ends of an LTR-RT experience mutations independently after insertion. 
Intraelement gene conversion in LTR-RTs is common in plant genomes (Cossu 
et al., 2017) and acts to homogenize the 5’ and 3’ ends of an LTR-RT. This leads 
to a reduction in the amount of sequence divergence, and an underestimation 
of LTR-RT insertion times. Our exclusion of LTR-RTs with evidence of 
intraelement gene conversion confronts this potential bias. LTR-RTs with 
significant evidence of intraelement gene conversion (p<0.05) were identified 
with GENECONV (Sawyer, 1989) using the following parameters “/w123/lp/f/ 
eb/g2 -include_monosites”. These LTR-RTs were excluded from the dataset 
and all subsequent analyses. 

LTR-RT insertion time estimation.—Insertion times were calculated follow- 
ing the methods of SanMiguel et al., (1998). LTR 5’ and 3’ pairs were aligned in 
MUSCLE (Edgar, 2004) and divergence between LTR pairs was calculated in 
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fastphylo v1.0.1 (Khan et al., 2013) using the Kimura two-parameter (K2P) and 
the GTR+y nucleotide substitution models in FastTree v2.1 (Price, Dehal, and 
Arkin, 2010). Insertion times were converted into million years using lineage 
specific synonymous substitution rates per site per year sourced from the 
literature (Appendix 1). Due to differences in rate heterogeneity observed in 
heterosporous versus homosporous ferns (Korall, Schuettpelz, and Pryer, 2010; 
Testo and Sundue, 2016), we chose to adopt a faster rate for heterosporous 
ferns. We also chose to summarize the distribution of LTR-RT insertion times 
with the median, and standard deviation for each taxon. Previous studies have 
employed half-life estimates of LTR-RTs (Lynch, 2007), however many of the 
distributions of LTR-RT insertion times we observed did not follow a negative 
exponential distribution and their LTR-RT half life could not be calculated. In 
addition, in our analyses the GTR+y nucleotide model consistently estimated 
younger insertion times for all taxa, with an average 56% reduction in the 
median estimate of LTR-RT insertion time relative to the K2P model. We chose 
to describe the results of the GTR+y model because it allowed a greater number 
of parameters to vary regarding how nucleotides change over time in 
comparison to the more simple K2P model (Tavaré, 1986). 

We also tested the robustness of our estimation of the median LTR-RT 
insertion time from few available LTR-RT sequences recovered in several of 
the homosporous fern genomes. We randomly subsampled 30, 50, and 100 
sequences from the total LTR-RTs recovered in Azolla filiculoides using a 
monte carlo sampling method with 10,000 replicates. We then calculated the 
median LTR-RT insertion time for each replicate and the median of all 
replicates and compared this value to our estimated value from all available 
full length LTR-RTs. 

Relationship between median LTR-RT insertion time and genome size.— 
Haploid nuclear genome size estimates were sourced from the literature. Due 
to the large variation in haploid genome size across taxa, a regression was 
performed on median LTR-RT insertion time against log; [haploid nuclear 
genome size Mbp]. In addition, because of the shared evolutionary history of 
the species included in our analyses, a phylogenetic independent contrast was 
performed on this dataset. Sequences for the rbcL plastid marker were 
downloaded from NCBI GenBank for each species and the outgroup 
Physcomitrella patens (Appendix 1). A rbcL sequence for Plagiogyria 
formosana was not available, and instead a sequence from the congener P. 
yvakumonticola Nakaike was used. Sequences were aligned in MUSCLE, and an 
ultrametric phylogenetic tree was inferred using BEAST v2.0 (Bouckaert et al., 
2014). The outgroup was pruned from the tree and the phylogenetic 
independent contrast was performed on both the logiolhaploid nuclear 
genome size Mbp] and mean LTR insertion date using the ‘pic’ function in 
the R ape package (Paradis, Claude, and Strimmer, 2004). A nexus file of the 
phylogenetic tree is available at TreeBASE (http://purl.org/phylo/treebase/ 
phylows/study/TB2:S24023). 
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Fic. 1. Estimated Distribution of LTR-RT Insertion Times in Vascular Plants. Ridgeline density 
plots of species across vascular plants with a focus on ferns and lycophytes. The heterosporous 
Selaginella and members of Salviniaceae marked by an asterik (*) share similar LTR-RT insertion 
dynamics with angiosperms. Homosporous ferns share similar dynamics to gymnosperms. 
Minimum density displayed 0.005, LTR-RT insertion times are based on the GTR + y nucleotide 
substitution model and lineage specific substitution rates. 


RESULTS 


Fern LTR-RT dynamics.—We observed different patterns of LTR-RT 
dynamics in homosporous versus heterosporous ferns. Across all five 
homosporous fern taxa analyzed, estimated LTR-RT insertion times followed 
a broad unimodal distribution with three common attributes (Fig. 1). 
Homosporous ferns had few recent LTR-RTs inserted within the past million 
years. Less than 3% of the LTR-RT insertions occurred within the past one 
million years in Ceratopteris richardii and Pteridium aquilinum, and 1% or 
less of LTR-RT insertions occurred within the past million years in Dipteris, 
Plagiogyria, and Cystopteris. The most recent activity in the homosporous 
ferns was found in Cystopteris with roughly 12% of the insertions occurring 
between 1-2 mya. The majority of the LTR-RTs in homosporous ferns were 
estimated to be inserted between 5-13 mya, with a median of 5.76 mya in 
Plagiogyria formosana, 6.87 mya in Ceratopteris richardii, 7.39 mya in 
Cystopteris protrusa, 7.5 mya in Dipteris conjugata, and 8.58 mya in Pteridium 
aquilinum. Finally, we found numerous LTR-RT insertions beyond 15 mya 
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across all homosporous fern taxa, comprising roughly 10% of the total 
recovered LTR-RT insertions for each taxon. The oldest full length LTR-RT 
recovered for each taxon was 39.28 mya in Pteridium, 34.57 mya in 
Ceratopteris, 28.68 mya in Dipteris, 25.42 mya in Cystopteris and 24.99 mya 
in Plagiogyria. 

LTR-RT dynamics in the two heterosporous ferns were markedly different 
from the homosporous ferns. In both Azolla and Salvinia, estimated LTR-RT 
insertion times were similar to those observed in angiosperms with high 
amounts of recent activity (Fig. 1). In contrast to the homosporous ferns, an 
estimated 85% of LTR-RT insertions occurred within the past one million 
years in Azolla and 53% in Salvinia. Median insertion times were also much 
younger with an estimated median of 0.39 mya in Azolla and 0.93 mya in 
Salvinia. In addition, fewer than 0.1% of LTR-RTs were older than 5 mya in 
Azolla and Salvinia. 

We explored the effect of small sample sizes on the accuracy of estimating 
the median value as observed in the homosporous ferns. We used a monte 
carlo subsampling approach with sample sizes of 30, 50, and 100 replicates 
representing possible numbers of LTR-RTs recovered from the Azolla genome. 
For all three subsampling sizes, we did not find a strong effect of sample size 
on the accuracy of our estimation of median LTR-RT insertion time (n=30, z- 
score=-0.1168; n=50, z-score=-0.1558; n=100, z-score=-0.2004). 

Lycophyte (Selaginella) LTR-RT dynamics.—All three Selaginella species 
shared similar patterns of LTR-RT activity. These LTR-RT dynamics are 
comparable to angiosperms and the heterosporous ferns (Fig. 1), in which a 
large proportion of the estimated LTR-RTs insertions occurred 0-4 mya. 
Median LTR insertion times for the Selaginella species were 0.33 mya in S. 
lepidophylla, 1.28 mya in S. tamariscina, and 1.37 mya in S. moellendorffii. In 
all Selaginella genomes sampled, a majority of LTR-RTs have insertion times 
within the past 1 million years. An estimated 39% of LTR-RTs are inserted 
within the past 1 million years in S. moellendorffii, 41% in S. tamariscina, and 
75% in S. lepidophylla. Few LTR-RTs inserted beyond 10 mya were recovered 
for any Selaginella species, the oldest estimated LTR-RT insertion for S. 
lepidophylla was 20.75 mya, 22.2 mya for S. moellendorffii, and 11.58 mya for 
S. tamariscina. 

Relationship between median LTR-RT insertion time and genome size.—A 
large range of median LTR-RT insertion times and genome sizes were observed 
in our analyses (Table 3). Median LTR-RT insertion times range from 0.24 mya 
in Erythranthe guttata to 13.26 mya in Pinus sylvestris, and haploid nuclear 
genome sizes range from 86 Mbp in Selaginella moellendorffii to 22.47 Gbp in 
Pinus sylvestris. A strong significant positive correlation is observed between 
the haploid nuclear genome size and median LTR-RT insertion time (Fig. 2). 
Species with small genome sizes consistently have relatively recent median 
LTR-RT insertion times, and species with large genomes have older median 
LTR-RT insertion times. This is evident for the linear regression of median 
LTR-RT insertion time against log; [haploid nuclear genome size] (Adj- 
R*=0.76, y=0.1684x + 2.3826, p<0.001, df=16, f-statistic=53.64), as well as 
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TasLe 3. LTR-RT Summary Statistics. Median insertion time and standard deviation of full length 
LTR-RTs recovered in the genome of each species from our analysis. Insertion times based on 
lineage specific substitution rates (Appendix 1). Summaries reported for both K2P and GTR + y 
nucleotide substitution models. 


Haploid Haploid 
nuclear genome chromosome 
Taxon size (Mbp) number (n) GTR + y ** K2P 
Selaginella lepidophylla 109 N/A 0.33 (1.6) 0.62 (2.20) 
Selaginella moellendorffii 86 10 1.37 (2.02) 2.6 (2.68) 
Selaginella tamariscina 150 10 1.28 (1.51) 2.35 (2.51) 
Azolla filiculoides 750 22 0.39 (0.53) 0.72 (0.94) 
Ceratopteris richardii 11,205 39 6.87 (6.07) 11.6 (8.16) 
Cystopteris protrusa 4,230 42 vi zs (6.03) 13.37 (11.01) 
Dipteris conjugata 2,450 33 5 (5.42) 12.47 (9.39) 
Plagiogvria formosana 14,810 N/A D. oe (5.17) 12.83 (8.66) 
Pteridium aquilinum 15,650 52 8.58 (5.83) 15.12 (9.33) 
Salvinia cucullata 260 N/A 0.93 (0.76) 1.63 (1.36) 
Gnetum montanum 4,200 22 8.72 (5.41) 15.59 (8.84) 
Picea abies 19,570 12 12.43 (6.27) 21.2 (10.47) 
Pinus sylvestris 22,474 12 13.26 (9.00) 21.66 (13.6) 
Amborella trichopoda 868 13 4.11 (1.98) 6.7 (3.23) 
Arabidopsis thaliana 135 * 0.64 (0.64) 1.06 (1.05) 
Erythranthe guttata 362 14 0.24 (0.52) 0.43 (0.87) 
Solanum lycopersicum 950 12 1.09 (1.34). 1.86 (1.31) 
Sorghum bicolor 697 10 0.46 (0.64) 0.83 (1.07) 


when phylogenetic relationships are taken into account (Adj-R*=0.76, 
y=5.067x , p<0.001, df&16, f-statistic=55.49). 


DISCUSSION 


Across the ten fern and lycophyte genomes we analyzed, we found that 
haploid nuclear genome size was positively correlated with median LTR-RT 
insertion time. Combined with data from representative seed plants, we found 
that the age distribution of LTR-RT insertions was well correlated with 
vascular plant haploid nuclear genome size. Across vascular plants, we found 
that species with large genomes have older median LTR-RT insertion times 
such as those seen in gymnosperms and homosporous ferns. Species with 
relatively small genomes had more recently inserted LTR-RTs such as those 
seen in Selaginella and the heterosporous ferns. This first survey of LTR-RT 
insertion times for homosporous ferns and other species bridges a significant 
phylogenetic gap in understanding the evolution of vascular plant genome 
size. This result is consistent with previous research from seed plants that 
found genome size and LTR-RT activity are well correlated. 

Previous analyses have found a strong positive correlation between haploid 
nuclear genome size and chromosome number in ferns and lycophytes 
(Bainard et al., 2011; Barker, 2013; Clark et al., 2016; Leitch and Leitch, 
2013; Nakazato et al., 2008; Sessa and Der, 2016). This pattern is absent from 
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Fic. 2. Genome Size by LTR-RT Median Insertion Time. Species with small genome sizes 
consistently have relatively recent LTR-RT insertion times, and species with large genomes have 
relatively older LTR-RT insertion times. A) The log;,(haploid nuclear genome size Mbp) and 
median LTR-RT insertion time (mya) across vascular plants. The median age of LTR-RT insertions 
is significantly correlated with haploid nuclear genome size (Adj-R°=0.76, y=0.1684x + 2.38262, 
p<0.001, df16, f-statistic=53.64). B) The phylogenetically corrected log;)(haploid nuclear genome 
size Mbp) and median LTR-RT insertion time (mya) across vascular plants. The phylogenetically 
corrected median time of LTR-RT insertions is also significantly correlated with haploid nuclear 
genome size (Adj-R°*=0.76, y=5.067x , p<0.001, df&16, f-statistic=55.49). 


seed plants (Nakazato et al., 2008), and instead most of the variation in their 
nuclear genome size is due to LTR-RT activity (El Baidouri and Panaud, 2013; 
Michael, 2014). The high correlation of chromosome number and genome size 
combined with uniquely small and uniform sized chromosomes in ferns and 
lycophytes (Nakazato et al., 2008; Wagner and Wagner, 1980) suggested that 
transposable element activity may be relatively low and not contribute 
significantly to genome size variation (Nakazato et al., 2008). However, our 
results suggest that LTR-RT dynamics also strongly influence the nuclear 
genome sizes of ferns and lycophytes (Selaginella spp.). 

Our current results indicate that the relationship among chromosome 
number, LTR-RT activity, and genome size is not clearly understood in these 
plants. It may be the rate of LTR-RT activity is sufficiently low enough that it 
does not interfere with the strong correlation of genome size and chromosome 
number generated by the unknown processes driving chromosome size 
uniformity. We currently have too few sampled fern and lycophyte genomes 
to formally analyze whether chromosome number or LTR-RT activity explains 
more variation in genome size, as well as their joint effect. However, the 
proportion of variation in genome size explained by LTR-RT activity across the 
fern and lycophyte species we sampled (adj-R“=0.88) is positive and possibly 
more than that explained by chromosome number (e.g., rho=0.5 Clark et al., 
2016; R°=0.83 Nakazato et al., 2008; R*=0.6 Sessa and Der 2016). Future 
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analyses leveraging better sampled fern and lycophyte genomes are needed to 
explore the relationships among chromosome number, LTR-RT activity, and 
genome size. In addition, comparative and population genomic approaches of 
LTR-RT activity and chromosomal evolution are needed in this clade to 
understand the unique pattern of genome evolution. 

Previous analyses have proposed that the high correlation of genome size 
and chromosome number, as well as the uniform size of fern chromosomes 
(Nakazato et al., 2008; Wagner and Wagner, 1980;), is caused by the 
suppression of transposable elements in homosporous ferns (Nakazato et al., 
2008). The older average LTR-RT insertion times we observed in homosporous 
ferns may support this hypothesis. The lifecycle of a LTR-RT inside a host 
genome involves replication, insertion into a different part of the genome, and 
potential silencing and deactivation of its activity via epigenetic and 
methylation processes from the host genome. The position of LTR-RTs inside 
the host genome can affect the expression of proximal genes, as genes proximal 
to methylated LTR-RTs often experience higher methylation and reduced 
expression (Hollister and Gaut, 2009; Niederhuth et al., 2016). Cytosine 
methylation is one type of DNA methylation. Across vascular plants, the 
highest cytosine methylation found to date is in homosporous ferns with an 
estimated 57% of their genes having greater than 90% of their cytosines 
methylated (Takuno, Ran, and Gaut, 2016). In contrast, cytosine methylation in 
Selaginella appears to be localized to a subset of approximately 10% of genes 
(Takuno, Ran, and Gaut, 2016; Van Buren et al., 2018). The reason for high 
methylation in homosporous ferns is unclear. However, homosporous ferns are 
unique among vascular plants because following whole genome duplication 
chromosomes appear to be largely retained (Barker and Wolf, 2010; Barker, 
2013; Nakazato et al., 2008). It may be that the high methylation in 
homosporous ferns is associated with silencing LTR-RTs and genes on these 
retained chromosomes. Future analyses of homosporous fern genomes are 
needed to disentangle the relationships among these processes and determine 
the processes ultimately driving their unique patterns of genome evolution. 

Older observed median insertion times of LTR-RTs is consistent with 
observed patterns of structural turnover in homosporous fern genomes. Unlike 
many angiosperm lineages, ferns and lycophytes have highly uniform 
chromosomes in both size and structure (Wagner and Wagner, 1980), and 
some lineages exhibit evidence of conserved genome size over long periods of 
geologic time (Baniaga, Arrigo, and Barker, 2016; Clark et al., 2016; Schneider 
et al., 2015). Even fossilized and extant chromosomes of the royal ferns 
(Osmundaceae) do not show significant difference in size despite more than 
180 million years (Bomfleur et al., 2014), and sterile hybrids may be generated 
between species of different genera that diverged more than 60 million years 
ago (Rothfels et al., 2015). Our observation of older median insertion times of 
LTR-RTs in homosporous ferns is consistent with a slow rate of structural 
turnover in their genomes. 

Why do we observe less active LTR-RTs in homosporous fern genomes? A 
possible explanation is their unique mating systems. Although most diploid 
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homosporous ferns sampled to date are highly outcrossing (Haufler 1987; 
Sessa, Testo, and Watkins, 2016; Soltis and Soltis, 1992), homosporous ferns 
are capable of sporophytic and gametophytic selfing (Haufler et al., 2016; 
Klekowski and Baker, 1966; Klekowski and Lloyd, 1968). Mating system plays 
a large role in transposable element (TE) dynamics, of which LTR-RTs are one 
type (Agren and Clark, 2018; Agren et al., 2014; Boutin, Le Rouzic, and Capy, 
2012; Charlesworth and Langley, 1986). Theoretical models suggest that if TEs 
are deleterious to host fitness, the activity and accumulation of TEs should be 
reduced in inbreeding species. This is because selfers reduce the spread of TEs 
between individuals, and, compared to outcrossers, more efficiently purge 
insertions with deleterious recessive effects (Morgan, 2001; Wright and 
Schoen, 1999; Wright et al., 2008). Outcrossing may also increase the activity 
and genetic diversity of LTR-RTs during transposition burst cycles (Sanchez et 
al., 2017). Expectations of these models parallel those concerning genetic load 
in homosporous ferns (Hedrick, 1987), and TEs, if deleterious, may comprise a 
large component of genetic load. 

Applying these theoretical models of mating system and TE dynamics to 
homosporous ferns we would expect that strongly outcrossing species should 
have more recent LTR-RT insertional activity and species with a history of 
inbreeding via selfing should have fewer recent LTR-RT insertions. Interest- 
ingly, despite relatively limited sampling, our analyses of LTR-RT dynamics in 
ferns and lycophytes are consistent with these predictions. The heterosporous 
Selaginella and the two aquatic ferns Azolla filiculoides and Salvinia cucullata 
share similar LTR-RT dynamics characterized by a predominance of recent 
insertions. Although there may be several adaptive roles of heterospory in 
vascular plants (Petersen and Burd, 2018), it does function to promote 
outcrossing. In contrast, LTR-RT dynamics in the homosporous ferns which 
are capable of a greater array of selfing mechanisms, have relatively fewer 
recent insertions. As the costs to sequencing decrease it would be worthwhile 
to experimentally manipulate and compare the TE and LTR-RT dynamics of 
homosporous ferns with different mating systems such as C. richardii and its 
sister species to empirically verify at a finer scale these macroevolutionary 
patterns. Similarly, it would also be useful to compare LTR-RT dynamics in 
homosporous ferns with high rates of gametophytic selfing, such as species in 
the Ophioglossaceae (Barker and Hauk, 2003), to species with low rates of self- 
fertilization. In general, ferns and lycophytes may prove fertile ground for 
testing hypotheses of LTR-RT dynamics and mating systems because of their 
diverse mating systems. 

Another interesting result of our analysis is evidence that we did not observe 
an effect of generation time on the insertion dynamics of LTR-RTs in 
homosporous ferns. For example, the minimum generation time of P. 
aquilinum is two to three years (Klekowski, 1972), whereas in C. richardii 
the minimum generation time is three months (Banks, 1999; Hickok and 
Warne, 2004; Hickok, Warne, and Fribourg, 1995). If generation time had a 
strong effect on the activity of LTR-RTs, we would expect to observe a greater 
amount of recent LTR-RT insertions in C. richardii, or much older LTR-RT 
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insertion times in the other fern taxa sampled. Instead, we found that C. 
richardii has a similar LTR-RT insertion distribution as observed in other fern 
taxa. 

The aim of our manuscript was to investigate LTR-RT insertion dynamics in 
ferns and lycophytes. We do note that the genome-skimming data used has 
several limitations but these were the only genome wide data available for 
homosporous ferns, and we view our analysis as a necessary first glimpse. 
Genome-skimming data has been used in angiosperms to characterize LTR-RT 
evolutionary dynamics (Bardil, Tayale, and Parisod, 2015; Senerchia et al., 
2013), and is consistent with whole genome based sequencing approaches at 
coverages as low as 0.1x (Choudhury, Neuhaus, and Parisod, 2017). 
Undoubtedly, young LTR-RT sequences occur in homosporous fern genomes, 
however our analysis suggests that their density is much reduced relative to 
the proportion of older LTR-RT elements. Future whole genome sequencing 
that leverages long read technology is warranted to further investigate LTR-RT 
dynamics in homosporous fern genomes. 

LTR-RTs are important components of vascular plant genomes. Our analysis 
investigated the LTR-RT dynamics of ferns and lycophytes and placed the 
results in the context of LTR-RT dynamics in other vascular plants. Both the 
heterosporous Selaginella and members of the Salviniaceae share similar LTR- 
RT insertion patterns to angiosperms, characterized by a predominance of 
insertions up to 5 mya. In contrast, the homosporous ferns share older LTR-RT 
insertion times intermediate between angiosperms and gymnosperms. We also 
show that across vascular plants LTR-RT insertion times are strongly 
associated with haploid nuclear genome size. Species with small genomes 
have rapid LTR-RT turnover rates with many young insertions, while species 
with larger genomes consistently have older LTR-RTs with relatively little 
recent activity. Ferns and lycophytes represent ancestors of the earliest 
diverging lineages of vascular plants, and in order to understand LTR-RT 
dynamics in vascular plants they must be included. Elements of their 
reproductive process such as the capacity for gametophytic selfing in 
homosporous ferns also make them unique study systems for future inquiries 
into LTR-RT and other transposable element dynamics. 
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APPENDIX 1. List of synonymous substitution rates per site per year used to estimate LTR-RT 
insertion times. List of GenBank accession numbers used to infer evolutionary relationships for a 
phylogenetic independent contrast (* no available sequence for Plagiogyria formosana was 
available and the congener P. vakumonticola was used). 


Species 


Selaginella lepidophylla 


Selaginella moellendorffii 


Selaginella tamariscina 
Azolla filiculoides 
Ceratopteris richardii 
Cystopteris protrusa 
Dipteris conjugata 
Plagiogyria formosana* 
Pteridium aquilinum 
Salvinia cucullata 
Gnetum montanum 
Picea abies 

Pinus sylvestris 
Amborella trichopoda 
Arabidopsis thaliana 
Erythranthe guttata 
Solanum lycopersicum 
Sorghum bicolor 
Physcomitrella patens 


Synonymous 
substitution 
rate/site/year 


6.50E-09 
6.50E-09 
6.50E-09 
1.50E-08 
4.79E-09 
4.79E-09 
4.79E-09 
4.79E-09 
4.79E-09 
1.50E-08 
2.20E-09 
2.20E-09 
2.20E-09 
6.50E-09 
1.50E-08 
1.50E-08 
1.30E-08 
1.30E-08 
N/A 


Reference 


Gaut et al., 1996 

Gaut et al., 1996 

Gaut et al., 1996 

Koch et al., 2000 

Barker, 2009 

Barker, 2009 

Barker, 2009 

Barker, 2009 

Barker, 2009 

Koch et al., 2000 

Nystedt et al., 2014 

Nystedt et al., 2014 

Nystedt et al., 2014 

Amborella Genome Project, 2013 

Koch et al., 2000 

Koch et al., 2000 

Ma and Bennetzen, 2004 

Ma and Bennetzen, 2004 
N/A 


NCBI GenB ank 
accession number 


AF419051.1 
KY023084.1 
AB574655.1 
KM360662.1 
EU352297.1 
JX874051.1 
EF588692.1 
AB574748.1 
AY300097.1 
U05649.1 
AY056575.1 
EU364777.1 
ABO097775.1 
L12628.2 
KU739560.1 
KF997284.1 . 


XM_010327541.3 


AM849341.1 
ABO013656.2 
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Asstract.—Reticulate evolution, in which phylogenetic relationships are not strictly bifurcating 
(tree-like), is a common feature of fern evolution. Ferns are prone to hybridization and whole 
genome duplication, two processes that can make untangling phylogenetic relationships among 
species challenging. Next-generation sequencing technologies have greatly increased the amount of 
data available for analyzing various aspects of evolutionary history, and here we test the ability of 
one next-generation sequencing approach to identify the progenitors of allopolyploids. We 
produced and analyzed double-digest restriction-site-associated DNA (ddRAD) sequences from six 
species of North American Dryopteris, including two allopolyploids and their respective diploid 
parents. The relationships of these species have been confidently established in previous studies, 
and our goal was to determine the extent to which RAD data are capable of identifying these known 
relationships. Analyses of the genetic structure in our samples reliably separated the diploids from 
one other, but in general each polyploid sample resembled one or the other of its progenitors, or 
had genetic variation unassignable to either parent. None of the polyploid samples had 
unambiguous genetic contributions from both known parents, as we had expected. These results 
may have been influenced by small overall sample size, different numbers of samples from the two 
diploid parents in each pair, and the large divergence times between the diploids. These are all 
potentially important issues to consider when designing similar studies, and our results therefore 
have useful implications for researchers interested in using a RAD approach to study polyploid 
complexes. 


Key Worps.—allopolyploid, next-generation sequencing, phylogenetics, reticulate evolution 


Fern enthusiasts, amateur and professional alike, are captivated by these 
plants for a multitude of reasons. For researchers interested in evolutionary 
processes, ferns are a particularly fascinating lineage to study because of their 
propensity for polyploidy and hybridization. These two processes, which are 
particularly common in ferns compared to other land plants (Wood et al., 
2009), have the potential to influence many aspects of evolution, particularly 
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those related to genome size, structure, and complexity, as well as 
phylogenetic relationships (Soltis, Visger, and Soltis, 2014). Hybridization 
occurs when members of two distinct evolutionary lineages interbreed and 
produce offspring. Polyploidy, or whole genome duplication, is a complete 
doubling of the genome that results in offspring with at least twice the number 
of chromosomes and genetic content of their progenitors. These processes can 
occur independently or in synchrony, producing organisms known as 
allopolyploids that are the product of both hybridization and genome 
doubling. 

The non-bifurcating phylogenies that result from these reticulate evolution- 
ary processes require extra effort to decipher. The traditional workhorse of 
plant phylogenetics, the chloroplast genome, is maternally inherited in most 
plants, including in ferns (Gastony and Yatskievych, 1992; Vogel et al., 1998), 
and can therefore identify only one parent of a putative hybrid or 
allopolyploid. Identifying the second, paternal parent, requires information 
from biparentally inherited nuclear markers. For the last two decades, 
obtaining data from these markers has relied on painstaking and time- 
consuming laboratory procedures to isolate each homoeologous copy (using an 
Escherichia coli vector that replicates via cloning), so that each can be 
sequenced independently. Genes such as gapCp (Schuettpelz et al., 2008), 
pgiC (Ishikawa et al., 2002), and many others (Rothfels, Li, et al., 2015) have 
been analyzed in this way and produced the first DNA-sequencing based 
confirmations of parentage in many fern polyploid complexes (e.g., in 
Dryopteris Adans. (Sessa, Zimmer, and Givnish, 2012b), Polystichum Roth 
(Jorgensen and Barrington, 2017), various Pteridaceae genera (Beck et al., 2010; 
Grusz, Windham, and Pryer, 2009), and many others). However, the rise of 
next-generation sequencing (NGS) technologies has inspired a natural desire to 
use these powerful and data-rich approaches as an alternative to labor- 
intensive gene-by-gene cloning and sequencing for analyzing polyploid 
complexes. Despite this, few studies have used NGS approaches to resolve 
reticulate evolutionary histories in ferns (but see Rothfels, Pryer, and Li, 2017). 

A principal limitation to the use of NGS data in studies of polyploids has 
been the challenge of correctly assembling homoeologous copies, especially 
when sequence data are generated on platforms with relatively short read 
lengths. Overcoming this and related bioinformatic challenges will be a critical 
step before widespread use of NGS approaches in the study of polyploid 
complexes can become feasible. An additional problem for ferns is their large 
genomes, which have made them less tractable than other plant lineages as 
study groups for NGS methods (for example, ferns were the last major lineage 
of land plants to have a reference nuclear genome sequenced, due in part to 
their massive genomes (F.-W. Li et al., 2018; Sessa et al., 2014). 

“Reduced representation” sequencing methods seek to minimize the 
complexity of genome assembly by sequencing only a subset of the complete 
genome (Andrews et al., 2016; Rowe, Renaut, and Guggisberg, 2011). While the 
cost of next-generation sequencing has decreased steadily, assembling the 
millions of short (typically 100-150 base pairs) sequencing reads produced by 
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these approaches remains an immense challenge. For many study systems, 
sequencing and assembling an entire genome remains out of reach, due either 
to financial limitations, assembly issues, or a combination of the two (e.g., 
difficult-to-assemble genomes benefit from long-read sequencing, which is 
more expensive than short-read sequencing). Whole genome sequencing may 
also be unnecessary for addressing questions of interest, such as identifying 
the parents of a polyploid species, a query for which sequence data from only 
one or a handful of markers is typically sufficient. In groups like ferns, where 
whole-genome sequencing is still impractical for the average researcher, 
reduced representation strategies are ideal for capitalizing on next-generation 
sequencing approaches while using resources efficiently. 

In the present study, we evaluated the utility of one reduced representa- 
tion, next-generation sequencing approach -— restriction-site-associated DNA 
sequencing (known as RAD or RADseq) — for identifying the progenitors of 
polyploid ferns. RAD and the related genotyping-by-sequencing (GBS) are 
both approaches that utilize the restriction enzyme cut sites that occur 
naturally across the genome (Andrews et al., 2016). Whole genomic DNA is 
first digested with restriction enzymes that cut the DNA only at specific 
sequences that are unique to each enzyme; these cut sites typically occur 
thousands of times throughout the genome, at different frequencies for 
different enzymes. The resulting DNA fragments will span a range of sizes, 
and those in the ideal range for next-generation sequencing can be selected at 
a later step in library preparation. By sequencing only fragments that are 
adjacent to restriction enzyme cut sites, RAD targets a non-random portion of 
the genome and therefore increases the likelihood of sequencing homologous 
regions across samples. This is especially important for organisms with large 
genomes, like ferns, where approaches such as genome skimming or 
“shotgun” sequencing (which theoretically sequence a truly random subset 
of the genome) are less likely to capture homologous sequences from different 
samples and species. Because SNPs are identified from individual (unas- 
sembled) reads in the RAD data analysis pipeline, this method does not suffer 
from the issues associated with assembling homoeologous copies from short- 
read data. However, while RAD and other reduced-representation approaches 
attempt to deal with the issue of large genome sizes, the analytical 
complications introduced by polyploidy still haunt these approaches, since 
the programs available for analyzing the datasets typically operate under the 
assumption that included taxa are diploid, a fundamental assumption (with 
numerous implications for expected copy numbers and locus behavior, 
among other things) that is not readily altered to accommodate data from 
known polyploids. For example, two of the most commonly used pipelines 
for processing GBS and RADseq data, Stacks (Catchen et al., 2013) and 
TASSEL (Bradbury et al., 2007), assume genotypes are diploid and do not 
permit allele frequencies to deviate from those expected in diploids, typically 
treating as noise information that may be in fact be the signal of polyploid 
genotypes. 
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Fic. 1. Relationships of diploid and polyploid species in the North American Dryopteris reticulate 
complex (based on (Sessa, Zimmer, and Givnish, 2012a, 2012b, 2012c). Letters below species 
names refer to their genomic designations. Pictures of all species are included at left (all photos by 
EB Sessa). 


We applied a double-digest RAD approach (ddRAD; so called because two 
restriction enzymes are used) to a small dataset for a polyploid complex in 
which relationships are known with confidence: North American Dryopteris. 
These ferns have been the focus of previous study by our research group 
(Sessa, Zimmer, and Givnish, 2012a, 2012b, 2012c; Sessa et al., 2015; Sessa 
and Givnish, 2014; Testo, Watkins, and Barrington, 2015), and the North 
American complex as a whole includes one extinct and four extant diploids, 
four allotetraploids, and one allohexaploid (Fig. 1). The parents of the 
allopolyploids were first hypothesized based on morphology and cytological 
observations (Walker, 1955, 1959, 1961, 1962, 1969), and later confirmed using 
sequences of gapCp and pgiC (Sessa, Zimmer, and Givnish, 2012b). Here we 
focused on two of the allotetraploids, D. campyloptera (Kunze) Clarkson and 
D. celsa (W. Palmer) Knowlt., Palmer & Pollard, and their respective diploid 
parents: D. intermedia (Willd.) A. Gray and D. expansa (C. Presl) Fraser-Jenk. & 
Jermy for D. campyloptera, and D. ludoviciana (Kunze) Small and D. goldiana 
(Hook. ex Goldie) A. Gray for D. celsa (Fig. 1). 

Our goal was to determine whether analysis of ddRAD sequences from a 
small sampling of these six species could recover evidence of their known 
polyploid-progenitor relationships by correctly grouping sequences from the 
two allotetraploids with their respective parent taxa, ideally with each 
polyploid showing evidence of equal genetic contributions from the two 
progenitors. The six species we selected are ideal for this study because all 
diploid progenitors are extant and can be sampled, and the diploids are 
sufficiently diverged from one another (Sessa, Zimmer, and Givnish, 2012a) 
that sequences from the polyploids can theoretically be assigned unambigu- 
ously to each of the four diploids. 
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TABLE 1. Sample ID and voucher information for samples included in this study. Institution 
acronyms follow Index Herbariorum (Thiers, 2018). 


U.S. state 
Genus Ploidy Sample ID collected Voucher 
1 Dryopteris campyloptera 4x E648 VA EBS 9722006 (FLAS) 
2 Dryopteris campyloptera 4x E650 VA EBS 9722005 (FLAS) 
3 Dryopteris campyloptera 4x E655 VA EBS 9722004 (FLAS) 
4 Dryopteris campyloptera 4x E649 VA EBS 9722016 (FLAS) 
5 Dryopteris celsa 4x E657 GA EBS 27 (WIS) 
6 Dryopteris celsa 4x E658 SC EBS 49 (WIS) 
7 Dryopteris celsa 4x E661 LA Price 94-2 (NY) 
8 Dryopteris celsa 4x E664 MO 3479307 (MO) 
9 Dryopteris expansa 2x E734 AK 5710532 (MO) 
10 Dryopteris goldiana 2x E700 NY EBS 65A (WIS) 
11 Drvopteris goldiana 2x E701 NY EBS 72 (WIS) 
12 Dryopteris goldiana 2x E715 NY EBS 12 (WIS) 
13 Dryopteris intermedia 2x E713 WV EBS 9722021 (FLAS) 
14 Drvyopteris intermedia 2X E714 VA EBS 9722010 (FLAS) 
15 Dryopteris intermedia 2x E736 MO 5198765 (MO) 
16 Dryopteris intermedia 2x E739 SC EBS 48 (WIS) 


MATERIALS AND METHODS 


_ Taxon sampling and DNA extraction.—We included sixteen samples 
representing six Dryopteris species, with all but two species represented by 
multiple accessions (Table 1). We extracted total gnomic DNA using a DNeasy 
Plant Mini Kit (Qiagen, Valencia, California, USA) following the manufactur- 
er’s protocol. 

ddRAD library preparation and sequencing.—We followed the ddRAD 
library construction protocol established by (Peterson et al., 2012), with a few 
modifications. Because of the large genome sizes of the focal taxa (average in 
Dryopteris diploids is 1C = 7.63 pg; Bainard et al., 2011), we replicated each 
sample three times during library preparation. To obtain equal numbers of 
reads for all individuals, we standardized DNA quantity prior to library 
preparation. 

We used two enzymes, Msel and EcoRI — a frequent cutter and an infrequent 
cutter, respectively (referring to the distribution of the enzymes’ cut sites 
across the genome) — to digest 6 tL of genomic DNA from each sample. We 
then ligated enzyme-specific, double-stranded adaptors (8-14 base pairs in 
length) to the digested DNA fragments, with the EcoRI adapter containing a 
unique barcode specific to each sample and replicate. We ensured successful 
ligation of the adaptors to the digested DNA by inspecting PCR products via gel 
electrophoresis visualization. We then pooled the restriction ligation product 
from each of the successful libraries and cleaned this pooled product using a 
QIAquick PCR Purification Kit (Qiagen, Valencia, California, USA). The 
pooled product was then run on an Elf Pippin Bioanalyzer (Sage Science, 
Massachusetts, USA) to select genomic fragments ranging from 350 to 700 bp 
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(size selection service provided by the University of Florida Interdisciplinary 
Center for Biotechnology Research; UF ICBR). We checked the success of the 
fragment size selection via gel electrophoresis and analysis on a TapeStation 
2200 Automated Electrophoresis (Agilent, California, USA) system. 

We performed a final round of PCR to anneal the Illumina sequencing 
primers to the digested DNA fragments, working with 1 pL at a time of 
digested, size-selected, pooled DNA. To reduce the opportunity for PCR errors 
we performed eighteen separate reactions and then combined the resulting 
PCR products for sequencing. We visualized a subset of the pooled product via 
gel electrophoresis to check for amplification success, and cleaned the 
remaining product using a QIAquick PCR Purification Kit. Pooled, clean PCR 
product was submitted to the UF ICBR where it was cleaned further using 
Ampure beads to remove unincorporated adaptors before being sequenced on 
an Illumina NextSeq500 platform, generating 2 X 150 bp reads. A 10% phixX 
spike was included during sequencing as an internal control. 

Data cleaning.—We processed the raw Illumina reads using the Process 
Radtags pipeline in Stacks (Catchen et al., 2013), retaining all reads with a 
quality score above 20 and splitting the raw reads by sample and replicate 
based on the unique EcoRI barcode, which was subsequently trimmed along 
with the cut site. This resulted in roughly 277 million reads. We then used the 
FAST-X Trimmer from the FAST-X Toolkit (http://hannonlab.cshl.edu/fastx_ 
toolkit/) to remove the Msel cut site from each secondary read and to trim the 
last four bases from the 3’ end of the primary reads to make all reads 127 bp in 
length. Cleaned, trimmed, forward and reverse reads for each individual were 
paired used PEAR v. 0.9.2 (Zhang et al., 2014), and all reads from each of the 
three replicates per sample were pooled together. Unfiltered, demultiplexed 
sequences have been deposited in NCBI GenBank Short Read Archive 
(PRJNA542715). 

Data analysis.—Processing the raw Illumina data occurred in three stages: 1) 
creation of a pseudo-reference genome, 2) alignment of reads and variant 
calling, and 3) admixture analysis. An R Markdown file describing this 
pipeline and all custom scripts is located here: https://github.com/ 
sylviakinosian/dryopteris_gbs. 


CREATION OF THE PSEUDO-REFERENCE GENOME.—Because there is no reference 
genome available for Dryopteris, we constructed a pseudo-reference using the 
sequences from the diploid taxa. We chose to use only the diploid taxa because 
they contain the majority of the sequence variation present in the tetraploids 
(which are known to be hybrids between the diploids), but harbor none of the 
possibly divergent sequences that may be present in the tetraploid species. We 
also performed our analyses separately on each tetraploid clade: Dryopteris 
celsa and its progenitors, and D. campyloptera and its progenitors. These two 
clades are separated by almost 40 million years of evolution (Sessa, Zimmer, 
and Givnish, 2012b). 

We created two pseudo-references (one for each clade) by first clustering 
highly similar sequences for each diploid taxon separately using VSEARCH v 
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2.4.2 (Rognes et al., 2016). Clustering was done at 92% similarity to create 
centroids for further clustering. We then clustered at 84% similarity and 
removed all sequences that collapsed at this stage, to exclude paralogs. We 
next combined the two diploid taxa from a given clade (either D. intermedia 
and D. expansa, or D. ludoviciana and D. goldiana) and clustered again using 
VSEARCH at 84% similarity; the resulting sequences were then used as our 
two pseudo-reference genomes. 


ALIGNMENT OF READS AND VARIANT CALLING.—Before calling variants for all of the 
included species, we first had to index the pseudo-reference genome from the 
previous step, which identifies sequence position points for the alignment. 
This was done using the INDEX function of BWA v. 0.7.10 (H. Li and Durbin, 
2009). Next, we used PicardTools v. 2.9.0 (Broad Institute, 2019) to create a 
sequence dictionary, and the INDEX function of SAMTOOLS v. 1.5 (H. Li et 
al,, 2009) was then used to create a FASTA index file. We used BWA ALN and 
SAMSE to align all individual reads to the appropriate pseudo-reference. Next, 
we used the SAMTOOLS functions VIEW, SORT, and INDEX to prepare all 
individual reads for variant calling. To call variants, we used two different 
_ methods. First, we used the GATK HaplotypeCaller v. 3.8.0 (McKenna et al., 
2010) to call variants on all samples as diploids. We then used VCFTOOLS v. 
0.1.15 (Danecek et al., 2011) to filter the resulting VCF file. For the second 
approach, we again used the GATK HaplotypeCaller, but called variants 
separately on the diploids and tetraploids (HaplotypeCaller allows the user to 
specify any ploidy). We then used a custom Python script to filter the resulting 
VCF files (VCFTOOLS does not support polyploids). Finally, we used custom 
Perl (v. 5.15.3, https://www.perl.org/) scripts to find the intersection of SNPs 
from the diploids and tetraploids. 


ADMIXTURE ANALYsIS.—We used the population genetics program ENTROPY v. 
1.2 (Gompert et al., 2014), which is very similar to the popular program 
STRUCTURE (Pritchard, Stephens, and Donnelly, 2000). While both are 
Bayesian, model-based approaches to population genetics, a key difference 
between them is that STRUCTURE assumes that individual genotypes are 
known a priori, whereas ENTROPY does not. ENTROPY calculates genotype 
likelihoods from raw sequence data and quality estimates; these genotype 
likelihoods are then used as an input for the model. STRUCTURE calculates 
the likelihoods repeatedly at each MCMC step from prior genotype 
assignments (Gompert et al., 2014). 

The first step of the ENTROPY analysis was to convert our VCF variant file to 
a Genotype Likelihood (GL) file format using a custom Perl script. A second 
Perl script was used to convert the GL file to a matrix for input to R v. 3.5.1 (R 
Development Core Team, 2016). We used the R package ADEGENET v. 2.1.1 
(Jombart, 2008) to perform a discriminant analysis of principal components 
(DAPC) to find the most likely source population for each individual. This 
analysis is similar to that performed by ENTROPY, but is less complex, and 
generates starting values that can be used to seed ENTROPY, which helps 
eliminate label swapping and allows the MCMC analysis to converge more 


274 AMERICAN FERN JOURNAL: VOLUME 109, NUMBER 3 (2019) 


quickly. We followed the DAPC protocol of Jombart and Collins (available at: 
http://adegenet.r-forge.r-project.org/files/tutorial-dapc.pdf). 

We ran ENTROPY for k = 2, 3, 4, and 5 with 2 chains in each analysis, and 
we examined the results obtained with and without the DAPC starting values, 
with high and low burn-in, and with various numbers of iterations. We found 
that the various values of these parameters did very little to alter the results, 
and so decided to run the final analysis using the starting values and with a 
large number of iterations (65,000) and a high burn-in (15,000). We used the 
program ESTPOST v. 1.2 (Gompert et al., 2014) to extract admixture 
proportions for each individual, and then used a custom R script and 
functions to visualize the ENTROPY output. 

Finally, we performed a principal component analysis on the ENTROPY 
output. We used the program ESTPOST to extract genotype probabilities for k 
= 2-5 and then read those data into R. We averaged the genotype probabilities 
for all k values, and then performed a PCA using the R function prcomp. All of 
the R code used to analyze and visualize the data is available as an Rmarkdown 
file on Github (https://github.com/sylviakinosian/dryopteris_gbs). 


RESULTS 


The pseudo-reference genomes constructed for the two clades differed in the 
relative evenness of contributions from the two sets of diploids: for the 
intermedia - expansa - campyloptera clade, 837,895 and 64,631 contigs were 
retrieved from D. intermedia and D. expansa, respectively. In the /Judoviciana - 
goldiana - celsa clade, the contigs were a bit more evenly divided, with 
440,468 retrieved from D. goldiana and 132,480 from D. ludoviciana. 

The first analysis, with variants called on all samples as diploids, retained 
1288 SNPs for the intermedia - expansa - campyloptera clade and 2122 SNPs 
for the Judoviciana - goldiana - celsa clade. The second analysis, which called 
variants for diploids and tetraploids separately, retained 3964 SNPs for the 
intermedia - expansa - campyloptera clade, and 4419 SNPs for the Judoviciana 
- goldiana - celsa clade. We performed the ENTROPY analysis on both sets of 
SNPs, and although calling variants separately on diploids and tetraploids 
obviously increased the total number of SNPs retained for both clades, we did 
not see a marked difference in the ENTROPY results between the two variant 
calling routines. We used the set of SNPs called only as diploids in the final 
analyses reported here. 

ENTROPY analyses of the genotype data generally separate the diploid taxa 
from one another, but for both clades, the polyploid species contain genomic 
contributions primarily from one parent or the other (in the case of Dryopteris 
celsa), or they contain a substantial fraction from one parent as well as 
additional fractions that are not attributed to the second parent (in the case of 
D. campyloptera) (Fig. 2). For the D. campyloptera samples, the diploid D. 
intermedia was the dominant contributor, with some small contributions from 
the other diploid parent, D. expansa, and additional fractions unassigned to 
either diploid. 
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Fic. 2. Results of ENTROPY analysis of genotype data for two polyploid clades in North American 
Dryopteris. For each clade, we investigated admixture proportions between the species of each 
clade with k = 2-5 ancestral source populations. Our expectation, given that we know the 
parentage of the tetraploids in both clades, would have been to see equal representation of the two 
diploid progenitors in the genetic makeup of each tetraploid sample. In contrast to that 
expectation, we found that for the D. campyloptera clade, at all k values, D. intermedia was the 
greatest contributor to the genome of the tetraploid hybrid D. campyloptera, and in the D. celsa 
clade, while the two diploid progenitors were both found to contribute to the tetraploid samples, it 
was always as an overwhelming contribution towards one parent or the other, in each of the 
samples. 


The principal component analysis yielded a similar pattern to the ENTROPY 
analysis. In the D. expansa - intermedia - campyloptera clade, the D. 
campyloptera individuals clustered most closely with the D. intermedia 
individuals (Fig. 3a). In the D. Judoviciana - goldiana - celsa clade, D. celsa 
clustered with both progenitor species, although more individuals clustered 
with D. Judoviciana individuals than with D. goldiana (Fig. 3b). 


DISCUSSION 


Dryopteris campyloptera clade.—For all values of k tested, SNPs associated 
with the diploid Dryopteris intermedia dominate the genetic complement of 
the tetraploid samples (Fig. 2, 3). Drvopteris expansa, the other diploid parent, 
has a distinct genotype that rarely appears in any of the tetraploids. We had an 
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Fic. 3. Principal component analyses for the two clades. Legends indicate species names and 
number of samples. A).In the D. expansa - intermedia - campyloptera clade, the tetraploid D. 
campvloptera clusters most closely with D. intermedia, as in the admixture analysis (Fig. 2). There 
is also reasonable genetic variation in the progenitor D. intermedia. B) In the D. ludoviciana - 
goldiana - celsa clade, the tetraploid D. celsa clusters with both progenitors. Interestingly one 
progenitor, D. goldiana, shows more variation here than in the admixture plot (Fig. 2). Clusters of 
points in the lower left-hand corners of both plots have been jittered slightly to make the points 
distinguishable. 


unequal number of samples from the two diploid parents in this group: three 
from D. intermedia and one from D. expansa. When constructing the reference 
genome for this clade, this unevenness would have resulted in a majority of the 
sequences belonging to the D. intermedia genome, as we saw — there were 
nearly 13 times as many contigs from D. intermedia in the pseudo-reference 
genome as there were from D. expansa. Consequently, most of the SNPs called 
were from this progenitor. Some SNPs associated with D. expansa do occur in 
each of the D. campyloptera samples at k=3, 4, and 5 (visible as extremely thin 
red lines at the tops of the stacked bars for those k values in Fig. 2), but these 
are extremely minor contributions to the tetraploid genome samples. Atk=5,a 
more substantial D. expansa contribution occurs in one of the tetraploids 
(sample E649), further supporting the hypothesis that only a small number of 
D. expansa sequences made it into the reference and subsequent SNP calling. 

An additional genetic component found in the D. campyloptera samples at k 
= 3-5 (light grey sections in Fig. 2) may be due to sequence divergence and 
accumulation of mutations in cut sites that would have occurred subsequent to 
the formation of the polyploid species, whose earliest inferred age of formation 
is 4.6 million years ago (Sessa, Zimmer, and Givnish, 2012b). That is an 
adequate amount of time for numerous mutations to occur that would alter the 
location and/or frequency of cut sites, and this would potentially produce 
fragments in the polyploids with lengths and sequence compositions not 
found in either parent species. Considering this, D. campyloptera sample E649 
perhaps best represents our a priori expectations for a successful result from 
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this study, as it includes sets of SNPs found in both parents as well as an 
additional component not found in either diploid. 

We also found evidence of genetic diversity in one of the diploid parents in 
this clade, D. intermedia. The three samples of D. intermedia differ 
considerably from one another in both the ENTROPY admixture analysis 
and the PCA (Figs. 2, 3), and at k= 2 a large portion of SNPs in two of the D. 
intermedia samples are associated with the D. expansa sample. At higher k 
values, the three D. intermedia samples become much more distinct from D. 
expansa; at k= 4 and 5, two of the D. intermedia samples resemble each other 
closely, while the third is distinct and resembles the D. campyloptera samples. 
These results suggest that there is substantial genetic diversity within and 
between populations of Dryopteris intermedia, which is unsurprising given its 
wide geographic range (Montgomery and Wagner, 1993). Both D. intermedia 
and D. expansa diverged from their closest diploid relatives in the Miocene 
(ca. 10-15 mya; Sessa, Zimmer, and Givnish, 2012a), and it therefore seems 
likely that we might discover additional genetic variation in D. expansa as 
well, if we sampled additional populations across its broad range. 

Dryopteris celsa clade.—For this clade, we also had three samples from one 
parent (D. goldiana) and one from the other (D. /Judoviciana), but there was less 
apparent sampling bias of the under-represented parent in the pseudo- 
reference assembly. The balance between the number of contigs these species 
contributed to the reference was more equal than in the D. campyloptera clade: 
after clustering, 440,468 contigs were retrieved from the D. goldiana 
individuals and 132,480 from D. Judoviciana. In all of the analyses of this 
clade (k = 2-5), D. goldiana and D. ludoviciana are assigned to separate 
populations; three of the D. celsa individuals are assigned to the “/udoviciana” 
population, and one is assigned to the “goldiana” population. There is less 
apparent bias in these assignments than there was in the D. campyloptera 
clade, in that both parental genotypes are present in the tetraploids. However, 
the results are clearly at odds with our expectations, which would have been 
for each of the tetraploid samples to show clear evidence of genetic 
contributions from (at least) two sources, rather than being dominated by 
only one. 

In this clade we also saw some evidence of genetic variation in one 
progenitor species, Dryopteris goldiana. This was most evident from the PCA 
(Fig. 4), where one of the D. goldiana individuals clustered very differently 
from the other two. Dryopteris goldiana has a broad geographic range, but is 
regionally rare and locally abundant across its distribution, and is found only 
in rich woods and ravines (Montgomery and Wagner, 1993). This could 
potentially isolate populations from one another, accounting for the diversity 
observed in the PCA results. Dryopteris ludoviciana has a much narrower 
range than any of the other three progenitor species in this study (Montgomery 
and Wagner, 1993), and further sampling would be required to reveal whether 
it has a similar amount of genetic variation as the other progenitor in this 
particular clade. 
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RAD data and analysis of polyploid complexes.—The goal of this research 
was to assess the utility of RAD sequence data for identifying progenitors in a 
polyploid complex. Our dataset, which consisted of a small number of samples 
from a group with known polyploids and no reference genome, is typical of 
what might be available to researchers interested in polyploid ferns, and thus 
serves as a test case for these types of analyses. Based on the results discussed 
above, our success was mixed, and interpretation of our results was greatly 
facilitated by knowing in advance the progenitors of the allopolyploid species. 
We suspect that studies attempting to use a RAD-based approach to determine 
relationships in a polyploid complex where progenitors are not known would 
face substantial challenges and likely obtain at least somewhat ambiguous 
results. Nonetheless, for researchers interested in these methods despite their 
potential shortcomings, there are several aspects of our study, including 
features of the species complex itself and of our experimental design, that are 
informative and which we discuss below. 

As mentioned earlier in the discussion, for both tetraploid complexes there 
was a disparity in our sampling of the parents: both groups had three samples 
from one parent and one sample from the other. When building the pseudo- 
reference genomes for each clade, we attempted to balance the number of 
contigs contributing to the reference from both diploid species equally. This 
was done using a Perl script to find the intersection of contigs from two species 
after the final clustering step. However, at each locus that matched our search 
criteria, we used the consensus sequence created by VSEARCH. In many cases, 
these consensus sequences were built from clusters of contigs that had a 
greater representation of one diploid over the other, making the consensus 
somewhat biased toward one parent. There are a handful of programmatic 
ways to remedy this using custom scripts, but perhaps the most practical and 
effective would be to increase our sample size and better balance sampling 
from the two progenitors. This would not only help create consensus 
sequences that are built from similar numbers of contigs from each species, 
but would also increase the size of the pseudo-reference and consequently the 
number of SNPs that could be called. 

RAD approaches were originally developed to address questions about 
population genetics (Andrews et al., 2016; Rowe, Renaut, and Guggisberg, 
2011), and are therefore most informative across relatively short evolutionary 
distances, typically for individuals whose maximum divergence is between 5 
and 10 mya. GBS approaches have been successful at analyzing hybrid 
complexes that are more recently diverged than our Dryopteris system, for 
example in Juglans (Zhao et al., 2018). In the present study, the two clades of 
Dryopteris are separated by about 40 million years of evolution, and even 
within the hybrid clades, roughly 15-25 million years have elapsed since the 
divergence of the diploid progenitors (Sessa, Zimmer, and Givnish, 2012a, 
2012b). Mutations can start to accumulate in enzyme cut sites that can result in 
non-homologous fragments being cut and amplified across species, which 
becomes more likely the longer ago the species diverged (Eaton et al., 2017). 
This is perhaps part of the reason that our results did not reflect particularly 
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well the known relationships between the tetraploid Dryopteris species and 
their progenitors. 

To investigate the time scales at which RAD approaches are most effective, 
Eaton et al. (2017) examined RADSegq data across several lineages with a range 
of divergence times. They found that phylogenetic distance was not a good 
predictor of the number of SNPs recovered; uneven sequence coverage was 
found to have the most impact on missing data. They also found that as sample 
size increased, some loci that were initially found as singletons in small 
datasets were recovered in more individuals, thereby becoming phylogenet- 
ically informative. This suggests that sample size might be the biggest driver of 
our less than satisfying results, as potentially informative SNPs may only occur 
in one of the sampled individuals. The small size of our dataset, both in terms 
of numbers of samples and numbers of species (two sets of three species), also 
influenced our ability to use additional programs available for analyzing SNP 
data. For example, phylogenetically-informed approaches such as HyDe and 
PhyloNet are unlikely to be informative with so few samples and clusters of 
only three taxa. In some of the datasets examined by Eaton et al. (2017), 
sequencing fewer loci at a higher coverage (essentially the strategy we 
employed in the present study) resulted in large amounts of missing data for 
highly divergent lineages. Even though we did not have a large amount of 
missing data across samples, the high specificity of the double digest method 
may have been problematic for such a highly divergent group as Dryopteris. 

Summary and future directions.—Ferns are known to hybridize across vast 
evolutionary distances (Rothfels, Johnson, et al., 2015; Sessa et al., 2018), and 
so sequencing tools are needed that are informative across diverse and highly 
divergent groups. RAD methods may be more informative for studying 
polyploid or hybrid complexes that have formed more recently than the North 
American Dryopteris complex, but by adjusting the RAD methodology as 
discussed in Eaton et al. (2017), this type of reduced representation may 
indeed be a good option for future investigations of deeply divergent fern 
species and their hybrids (assuming that adequate sample size can be 
achieved). 

In addition to reduced representation methods, there are several next 
generation sequencing techniques that could potentially be utilized to explore 
complexes involving deeply divergent fern lineages. Ultraconserved elements 
(UCEs) have proven useful for investigating deep divergences in animal 
lineages, but are not a viable option for plants due to many ancient polyploidy 
events (Jiao et a/., 2011), which can fracture and rearrange the genome, making 
UCEs difficult or impossible to isolate (Reneker et al., 2012). Target sequence 
capture methods (TSC) and exome sequencing currently seem to be the most 
promising methods for use in this field. TSC methods, which use probes or 
baits to preferentially amplify and sequence specific regions of the genome 
(often low or single copy nuclear genes) have been shown to be effective at 
resolving phylogenetic relationships across ferns (Wolf et al., 2018), and 
several methodologies are available for developing baits from transcriptome or 
genome sequences (Wolf et al., 2018; Yang and Smith, 2014; Zimmer and Wen, 
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2015). Exome capture kits have been designed for several polyploid crop 
plants (Warr et al., 2015), and could potentially be useful for fern population 
genetics as well. Both TSC and exome capture methods provide large amounts 
of data that should be informative across the evolutionary time scales needed 
to investigate the hybrid complexes that are so common in ferns, such as in 
North American Dryopteris. 
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Authors are encouraged to submit manuscripts pertinent to the study of ferns 
and lycophytes for publication in the American Fern Journal. Manuscripts should 
be submitted to http://www.editorialmanager.com/afj/. Acceptance of papers 
for publication depends on merit as judged by two or more referees. Authors are 
encouraged to contribute toward publishing costs; however, the payment or non- 
payment of page charges will affect neither the acceptability of manuscripts nor 
the date of publication. Mandatory fees will be charged for color figures in print 
and/or in electronic versions of published papers. 


Information for authors, including detailed requirements for formatting 
manuscripts can be found online at: https://www.amerfernsoc.org/american- 
fern-journal/. 


FOR INFORMATION REGARDING AVAILABLE PUBLICATIONS, 
VISIT THE AMERICAN FERN SOCIETY’S WEBSITE: 
http://amerfernsoc.org/ 


The “American Fern Journal” (ISSN 0002-8444) is an illustrated quarterly devoted to the general 
study of ferns. It is owned by the American Fern Society, and published at The American Fern Society, 
c/o 926 W. Campus Drive, MC 0406, Blacksburg, VA 24061-1040. Periodicals postage paid at Blacks- 
burg, VA, and additional entry. 

Claims for missing issues, made 6 months (domestic) to 12 months (foreign) after the date of issue, 
and orders for back issues should be addressed to Dr. George Yatskievych, Plant Resources Center, 
University of Texas at Austin, Main Bldg, Rm 127, 110 Inner Campus Dr, Stop F0404, Austin, TX 78712- 
1711; e-mail: george. yatskievych@austin.utexas.edu 

Changes of address, dues, and applications for membership should be sent to the Membership 
Secretary. 

General inquiries concerning ferns should be addressed to the Secretary. 

Back volumes are available for most years as printed issues. Please contact the Back Issues Curator 
for prices and availability. Digital versions of back issues three years or older are available free online 
through the Biodiversity Heritage Library: http://www. biodiversitylibrary. org/bibliography/43943#/ 
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Subscriptions Rates per Volume: 

Personal Journal Membership - (includes Journal and Fiddlehead Forum) $40 


Life Membership $600 
Non-professional Membership - (includes Fiddlehead Forum) $20 
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ferns and lycophytes is beginning to provide a clearer picture of their genome evolution. 


Dryopteris clintoniana (Credit: E. Sessa) with cut-outs, clockwise from top left, of 
LTR-RT insertion times in fern and lycophyte genomes (Credit: A. Baniaga and M. S. 
Barker, Figure 2, this issue), variation in homeolog expression levels in Polypodium 


(Credit: E. Sigel et al., Figure 2, this issue), and a phylogeny of prospective model fern 


genomes (A. R. Petlewski and F.-W. Li, Figure 2, this issue). 


